IDEAS home Printed from https://ideas.repec.org/a/eee/finana/v95y2024ipbs1057521924003375.html
   My bibliography  Save this article

Corporate fraud detection based on linguistic readability vector: Application to financial companies in China

Author

Listed:
  • Zhang, Yi
  • Liu, Tianxiang
  • Li, Weiping

Abstract

Existing research on corporate fraud identification mainly uses text data disclosed by companies to construct models. However, the semantic text information is lost after vectorizing text data using natural language processing methods. Based on the linguistic features of Chinese texts, we construct a new Chinese character-level readability index, a Chinese word-level readability index, a Chinese sentence-level readability index, and a Chinese paragraph-level readability index, and consider them together to define for the first time linguistic readability vectors of Chinese text. This paper takes A-share companies in the financial industry listed on the Shanghai and Shenzhen stock exchanges from 2005 to 2019 as the research object, and uses the natural language processing method, Word2Vec, to vectorize management's discussion and analysis (MD&A) of the company's annual reports. We then use machine learning algorithms to construct fraud identification models by using the readability vector data to complement the MD&A semantically. The empirical results show that the performance of all three types of machine learning models improves after supplementing with the semantic information of the readability vector, with the support vector machine improving the most significantly, with 31.17%, 2.56%, 26.33%, and 2.45% improvement in accuracy, recall, F1-score, and AUC, respectively. This not only enriches the semantic interpretation of Chinese annual reports but also improves the empirical effectiveness of fraud recognition models.

Suggested Citation

  • Zhang, Yi & Liu, Tianxiang & Li, Weiping, 2024. "Corporate fraud detection based on linguistic readability vector: Application to financial companies in China," International Review of Financial Analysis, Elsevier, vol. 95(PB).
  • Handle: RePEc:eee:finana:v:95:y:2024:i:pb:s1057521924003375
    DOI: 10.1016/j.irfa.2024.103405
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1057521924003375
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.irfa.2024.103405?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:finana:v:95:y:2024:i:pb:s1057521924003375. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/620166 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.