Corporate fraud detection based on linguistic readability vector: Application to financial companies in China

My bibliography Save this article

Corporate fraud detection based on linguistic readability vector: Application to financial companies in China

Author

Listed:

Zhang, Yi
Liu, Tianxiang
Li, Weiping

Registered:

Abstract

Existing research on corporate fraud identification mainly uses text data disclosed by companies to construct models. However, the semantic text information is lost after vectorizing text data using natural language processing methods. Based on the linguistic features of Chinese texts, we construct a new Chinese character-level readability index, a Chinese word-level readability index, a Chinese sentence-level readability index, and a Chinese paragraph-level readability index, and consider them together to define for the first time linguistic readability vectors of Chinese text. This paper takes A-share companies in the financial industry listed on the Shanghai and Shenzhen stock exchanges from 2005 to 2019 as the research object, and uses the natural language processing method, Word2Vec, to vectorize management's discussion and analysis (MD&A) of the company's annual reports. We then use machine learning algorithms to construct fraud identification models by using the readability vector data to complement the MD&A semantically. The empirical results show that the performance of all three types of machine learning models improves after supplementing with the semantic information of the readability vector, with the support vector machine improving the most significantly, with 31.17%, 2.56%, 26.33%, and 2.45% improvement in accuracy, recall, F1-score, and AUC, respectively. This not only enriches the semantic interpretation of Chinese annual reports but also improves the empirical effectiveness of fraud recognition models.

Suggested Citation

Zhang, Yi & Liu, Tianxiang & Li, Weiping, 2024. "Corporate fraud detection based on linguistic readability vector: Application to financial companies in China," International Review of Financial Analysis, Elsevier, vol. 95(PB).

Handle: RePEc:eee:finana:v:95:y:2024:i:pb:s1057521924003375
DOI: 10.1016/j.irfa.2024.103405

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Lo, Kin & Ramos, Felipe & Rogo, Rafael, 2017. "Earnings management and annual report readability," Journal of Accounting and Economics, Elsevier, vol. 63(1), pages 1-25.
Sunita Goel & Jagdish Gangolly, 2012. "Beyond The Numbers: Mining The Annual Reports For Hidden Cues Indicative Of Financial Statement Fraud," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 19(2), pages 75-89, April.
Lynnette Purda & David Skillicorn, 2015. "Accounting Variables, Deception, and a Bag of Words: Assessing the Tools of Fraud Detection," Contemporary Accounting Research, John Wiley & Sons, vol. 32(3), pages 1193-1223, September.
Zhang, Yi & Hu, Ailing & Wang, Jiahua & Zhang, Yaojie, 2022. "Detection of fraud statement based on word vector: Evidence from financial companies in China," Finance Research Letters, Elsevier, vol. 46(PB).
Messod D. Beneish, 1999. "The Detection of Earnings Manipulation," Financial Analysts Journal, Taylor & Francis Journals, vol. 55(5), pages 24-36, September.
Wei Xu & Zhenye Yao & Donghua Chen, 2019. "Chinese annual report readability: measurement and test," China Journal of Accounting Studies, Taylor & Francis Journals, vol. 7(3), pages 407-437, July.
Dyer, Travis & Lang, Mark & Stice-Lawrence, Lorien, 2017. "The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation," Journal of Accounting and Economics, Elsevier, vol. 64(2), pages 221-245.
Jones, Jj, 1991. "Earnings Management During Import Relief Investigations," Journal of Accounting Research, Wiley Blackwell, vol. 29(2), pages 193-228.
Li, Feng, 2008. "Annual report readability, current earnings, and earnings persistence," Journal of Accounting and Economics, Elsevier, vol. 45(2-3), pages 221-247, August.
Shuyu Zhang & Xuanyu Zhou & Huifeng Pan & Junyi Jia, 2019. "Cryptocurrency, confirmatory bias and news readability – evidence from the largest Chinese cryptocurrency exchange," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 58(5), pages 1445-1468, March.
Joseph F. Brazel & Keith L. Jones & Mark F. Zimbelman, 2009. "Using Nonfinancial Measures to Assess Fraud Risk," Journal of Accounting Research, Wiley Blackwell, vol. 47(5), pages 1135-1166, December.
Patricia M. Dechow & Weili Ge & Chad R. Larson & Richard G. Sloan, 2011. "Predicting Material Accounting Misstatements," Contemporary Accounting Research, John Wiley & Sons, vol. 28(1), pages 17-82, March.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Senave, Elseline & Jans, Mieke J. & Srivastava, Rajendra P., 2023. "The application of text mining in accounting," International Journal of Accounting Information Systems, Elsevier, vol. 50(C).
Muhammad Farhan Malik & Yuan George Shan & Jamie Yixing Tong, 2022. "Do auditors price litigious tone?," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 62(S1), pages 1715-1760, April.
Rjiba, Hatem & Saadi, Samir & Boubaker, Sabri & Ding, Xiaoya (Sara), 2021. "Annual report readability and the cost of equity capital," Journal of Corporate Finance, Elsevier, vol. 67(C).
- Hatem Rjiba & Samir Saadi & Sabri Boubaker & Xiaoya Ding, 2021. "Annual report readability and the cost of equity capital," Post-Print hal-04455605, HAL.
Dan Amiram & Zahn Bozanic & James D. Cox & Quentin Dupont & Jonathan M. Karpoff & Richard Sloan, 2018. "Financial reporting fraud and other forms of misconduct: a multidisciplinary review of the literature," Review of Accounting Studies, Springer, vol. 23(2), pages 732-783, June.
Nerissa C. Brown & Richard M. Crowley & W. Brooke Elliott, 2020. "What Are You Saying? Using topic to Detect Financial Misreporting," Journal of Accounting Research, Wiley Blackwell, vol. 58(1), pages 237-291, March.
Li, Jing & Li, Nan & Xia, Tongshui & Guo, Jinjin, 2023. "Textual analysis and detection of financial fraud: Evidence from Chinese manufacturing firms," Economic Modelling, Elsevier, vol. 126(C).
Elshandidy, Tamer & Kamel, Hany, 2024. "Tone of narrative disclosures and earnings management: UK evidence," Advances in accounting, Elsevier, vol. 64(C).
Fahd Alduais & Nashat Ali Almasria & Abeer Samara & Ali Masadeh, 2022. "Conciseness, Financial Disclosure, and Market Reaction: A Textual Analysis of Annual Reports in Listed Chinese Companies," IJFS, MDPI, vol. 10(4), pages 1-22, November.
Jie He & Kam C. Chan, 2023. "Does short sales deregulation affect qualitative information disclosure?," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 63(S1), pages 1351-1380, April.
Bhattacharya, Indranil & Mickovic, Ana, 2024. "Accounting fraud detection using contextual language learning," International Journal of Accounting Information Systems, Elsevier, vol. 53(C).
Pinto, Inês & Morais, Ana Isabel & Quick, Reiner, 2020. "The impact of the precision of accounting standards on the expanded auditor’s report in the European Union," Journal of International Accounting, Auditing and Taxation, Elsevier, vol. 40(C).
Chychyla, Roman & Leone, Andrew J. & Minutti-Meza, Miguel, 2019. "Complexity of financial reporting standards and accounting expertise," Journal of Accounting and Economics, Elsevier, vol. 67(1), pages 226-253.
Boone, Jeff & Hao, Jie & Linthicum, Cheryl & Pham, Viet, 2024. "Impression management strategy — The relationship between accounting narrative thematic bias and financial graph distortion," The British Accounting Review, Elsevier, vol. 56(4).
James P. Ryans, 2021. "Textual classification of SEC comment letters," Review of Accounting Studies, Springer, vol. 26(1), pages 37-80, March.
Berkin, Anil & Aerts, Walter & Van Caneghem, Tom, 2023. "Feasibility analysis of machine learning for performance-related attributional statements," International Journal of Accounting Information Systems, Elsevier, vol. 48(C).
David F. Larcker & Anastasia A. Zakolyukina, 2012. "Detecting Deceptive Discussions in Conference Calls," Journal of Accounting Research, Wiley Blackwell, vol. 50(2), pages 495-540, May.
- Larcker, David F. & Zakolyukina, Anastasia A., 2010. "Detecting Deceptive Discussions in Conference Calls," Research Papers 2060, Stanford University, Graduate School of Business.
Li, Guowen & Wang, Shuai & Feng, Yuyao, 2024. "Making differences work: Financial fraud detection based on multi-subject perceptions," Emerging Markets Review, Elsevier, vol. 60(C).
Blankespoor, Elizabeth & deHaan, Ed & Marinovic, Iván, 2020. "Disclosure processing costs, investors’ information choice, and equity market outcomes: A review," Journal of Accounting and Economics, Elsevier, vol. 70(2).
Meng, Qingxi & He, Yan & Zhang, Anting & Gong, Xiaoyun, 2023. "Does mandatory operating information disclosure affect stock price crash risk? Evidence from China," Pacific-Basin Finance Journal, Elsevier, vol. 82(C).
Oz, Seda, 2024. "The impact of terrorist attacks and mass shootings on earnings management," The British Accounting Review, Elsevier, vol. 56(3).

More about this item

Keywords

Text readability vector; Machine learning; Word2vec; Fraud identification; Auditing;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:finana:v:95:y:2024:i:pb:s1057521924003375. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/620166 .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Corporate fraud detection based on linguistic readability vector: Application to financial companies in China

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data