IDEAS home Printed from https://ideas.repec.org/a/eee/finana/v95y2024ipbs1057521924003375.html
   My bibliography  Save this article

Corporate fraud detection based on linguistic readability vector: Application to financial companies in China

Author

Listed:
  • Zhang, Yi
  • Liu, Tianxiang
  • Li, Weiping

Abstract

Existing research on corporate fraud identification mainly uses text data disclosed by companies to construct models. However, the semantic text information is lost after vectorizing text data using natural language processing methods. Based on the linguistic features of Chinese texts, we construct a new Chinese character-level readability index, a Chinese word-level readability index, a Chinese sentence-level readability index, and a Chinese paragraph-level readability index, and consider them together to define for the first time linguistic readability vectors of Chinese text. This paper takes A-share companies in the financial industry listed on the Shanghai and Shenzhen stock exchanges from 2005 to 2019 as the research object, and uses the natural language processing method, Word2Vec, to vectorize management's discussion and analysis (MD&A) of the company's annual reports. We then use machine learning algorithms to construct fraud identification models by using the readability vector data to complement the MD&A semantically. The empirical results show that the performance of all three types of machine learning models improves after supplementing with the semantic information of the readability vector, with the support vector machine improving the most significantly, with 31.17%, 2.56%, 26.33%, and 2.45% improvement in accuracy, recall, F1-score, and AUC, respectively. This not only enriches the semantic interpretation of Chinese annual reports but also improves the empirical effectiveness of fraud recognition models.

Suggested Citation

  • Zhang, Yi & Liu, Tianxiang & Li, Weiping, 2024. "Corporate fraud detection based on linguistic readability vector: Application to financial companies in China," International Review of Financial Analysis, Elsevier, vol. 95(PB).
  • Handle: RePEc:eee:finana:v:95:y:2024:i:pb:s1057521924003375
    DOI: 10.1016/j.irfa.2024.103405
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1057521924003375
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.irfa.2024.103405?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lo, Kin & Ramos, Felipe & Rogo, Rafael, 2017. "Earnings management and annual report readability," Journal of Accounting and Economics, Elsevier, vol. 63(1), pages 1-25.
    2. Sunita Goel & Jagdish Gangolly, 2012. "Beyond The Numbers: Mining The Annual Reports For Hidden Cues Indicative Of Financial Statement Fraud," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 19(2), pages 75-89, April.
    3. Lynnette Purda & David Skillicorn, 2015. "Accounting Variables, Deception, and a Bag of Words: Assessing the Tools of Fraud Detection," Contemporary Accounting Research, John Wiley & Sons, vol. 32(3), pages 1193-1223, September.
    4. Zhang, Yi & Hu, Ailing & Wang, Jiahua & Zhang, Yaojie, 2022. "Detection of fraud statement based on word vector: Evidence from financial companies in China," Finance Research Letters, Elsevier, vol. 46(PB).
    5. Messod D. Beneish, 1999. "The Detection of Earnings Manipulation," Financial Analysts Journal, Taylor & Francis Journals, vol. 55(5), pages 24-36, September.
    6. Wei Xu & Zhenye Yao & Donghua Chen, 2019. "Chinese annual report readability: measurement and test," China Journal of Accounting Studies, Taylor & Francis Journals, vol. 7(3), pages 407-437, July.
    7. Dyer, Travis & Lang, Mark & Stice-Lawrence, Lorien, 2017. "The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation," Journal of Accounting and Economics, Elsevier, vol. 64(2), pages 221-245.
    8. Jones, Jj, 1991. "Earnings Management During Import Relief Investigations," Journal of Accounting Research, Wiley Blackwell, vol. 29(2), pages 193-228.
    9. Li, Feng, 2008. "Annual report readability, current earnings, and earnings persistence," Journal of Accounting and Economics, Elsevier, vol. 45(2-3), pages 221-247, August.
    10. Shuyu Zhang & Xuanyu Zhou & Huifeng Pan & Junyi Jia, 2019. "Cryptocurrency, confirmatory bias and news readability – evidence from the largest Chinese cryptocurrency exchange," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 58(5), pages 1445-1468, March.
    11. Joseph F. Brazel & Keith L. Jones & Mark F. Zimbelman, 2009. "Using Nonfinancial Measures to Assess Fraud Risk," Journal of Accounting Research, Wiley Blackwell, vol. 47(5), pages 1135-1166, December.
    12. Patricia M. Dechow & Weili Ge & Chad R. Larson & Richard G. Sloan, 2011. "Predicting Material Accounting Misstatements," Contemporary Accounting Research, John Wiley & Sons, vol. 28(1), pages 17-82, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Senave, Elseline & Jans, Mieke J. & Srivastava, Rajendra P., 2023. "The application of text mining in accounting," International Journal of Accounting Information Systems, Elsevier, vol. 50(C).
    2. Dan Amiram & Zahn Bozanic & James D. Cox & Quentin Dupont & Jonathan M. Karpoff & Richard Sloan, 2018. "Financial reporting fraud and other forms of misconduct: a multidisciplinary review of the literature," Review of Accounting Studies, Springer, vol. 23(2), pages 732-783, June.
    3. Li, Jing & Li, Nan & Xia, Tongshui & Guo, Jinjin, 2023. "Textual analysis and detection of financial fraud: Evidence from Chinese manufacturing firms," Economic Modelling, Elsevier, vol. 126(C).
    4. Fahd Alduais & Nashat Ali Almasria & Abeer Samara & Ali Masadeh, 2022. "Conciseness, Financial Disclosure, and Market Reaction: A Textual Analysis of Annual Reports in Listed Chinese Companies," IJFS, MDPI, vol. 10(4), pages 1-22, November.
    5. Jie He & Kam C. Chan, 2023. "Does short sales deregulation affect qualitative information disclosure?," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 63(S1), pages 1351-1380, April.
    6. Muhammad Farhan Malik & Yuan George Shan & Jamie Yixing Tong, 2022. "Do auditors price litigious tone?," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 62(S1), pages 1715-1760, April.
    7. Rjiba, Hatem & Saadi, Samir & Boubaker, Sabri & Ding, Xiaoya (Sara), 2021. "Annual report readability and the cost of equity capital," Journal of Corporate Finance, Elsevier, vol. 67(C).
    8. Nerissa C. Brown & Richard M. Crowley & W. Brooke Elliott, 2020. "What Are You Saying? Using topic to Detect Financial Misreporting," Journal of Accounting Research, Wiley Blackwell, vol. 58(1), pages 237-291, March.
    9. Elshandidy, Tamer & Kamel, Hany, 2024. "Tone of narrative disclosures and earnings management: UK evidence," Advances in accounting, Elsevier, vol. 64(C).
    10. Bhattacharya, Indranil & Mickovic, Ana, 2024. "Accounting fraud detection using contextual language learning," International Journal of Accounting Information Systems, Elsevier, vol. 53(C).
    11. Boone, Jeff & Hao, Jie & Linthicum, Cheryl & Pham, Viet, 2024. "Impression management strategy — The relationship between accounting narrative thematic bias and financial graph distortion," The British Accounting Review, Elsevier, vol. 56(4).
    12. Oz, Seda, 2024. "The impact of terrorist attacks and mass shootings on earnings management," The British Accounting Review, Elsevier, vol. 56(3).
    13. Abdullah Albizri & Deniz Appelbaum & Nicholas Rizzotto, 2019. "Evaluation of financial statements fraud detection research: a multi-disciplinary analysis," International Journal of Disclosure and Governance, Palgrave Macmillan, vol. 16(4), pages 206-241, December.
    14. Jaeschke, Reemda & Lopatta, Kerstin & Yi, Cheong, 2018. "Managers’ use of language in corrupt firms’ financial disclosures: Evidence from FCPA violators," Scandinavian Journal of Management, Elsevier, vol. 34(2), pages 170-192.
    15. Aghamolla, Cyrus & Smith, Kevin, 2023. "Strategic complexity in disclosure," Journal of Accounting and Economics, Elsevier, vol. 76(2).
    16. Wanli Li & Tiantian Yan & Yue Li & Ziqiao Yan, 2023. "Earnings management and CSR report tone: Evidence from China," Corporate Social Responsibility and Environmental Management, John Wiley & Sons, vol. 30(4), pages 1883-1902, July.
    17. Soliman, Marwa & Ben-Amar, Walid, 2022. "Corporate social responsibility orientation and textual features of financial disclosures," International Review of Financial Analysis, Elsevier, vol. 84(C).
    18. Elvia R. Shauki & Eva Oktavini, 2022. "Earnings Management and Annual Report Readability: The Moderating Effect of Female Directors," IJFS, MDPI, vol. 10(3), pages 1-11, August.
    19. Yao, Yanzhen & Wei, Lu & Jing, Haozhe & Chen, Meiqi & Li, Zhan, 2024. "The impact of readability of risk disclosures in bond prospectuses on credit risk premium," Research in International Business and Finance, Elsevier, vol. 70(PA).
    20. Li, Ken, 2022. "Textual fundamentals in earnings press releases," Advances in accounting, Elsevier, vol. 57(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:finana:v:95:y:2024:i:pb:s1057521924003375. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/620166 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.