IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v70y2024i12p8962-8987.html
   My bibliography  Save this article

How Much Can Machines Learn Finance from Chinese Text Data?

Author

Listed:
  • Yang Zhou

    (Institute for Big Data, Fudan University, Shanghai 200433, China; MOE Laboratory for National Development and Intelligent Governance, Fudan University, Shanghai 200433, China)

  • Jianqing Fan

    (International School of Economics and Management, Capital University of Economics and Business, Beijing 100070, China; Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544; School of Data Science, Fudan University, Shanghai 200433, China)

  • Lirong Xue

    (Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544)

Abstract

How much can we learn finance directly from text data? This paper presents a new framework for learning textual data based on the factor augmentation model and sparsity regularization, called the factor-augmented regularized model for prediction (FarmPredict), to let machines learn financial returns directly from news. FarmPredict allows the model itself to extract information directly from articles without predefined information, such as dictionaries or pretrained models as in most studies. Using unsupervised learned factors to augment the predictors would benefit our method with a “double-robust” feature: that the machine would learn to balance between individual words or text factors/topics. It also avoids the information loss of factor regression in dimensionality reduction. We apply our model to the Chinese stock market with a large proportion of retail investors by using Chinese news data to predict financial returns. We show that positive sentiments scored by our FarmPredict approach from news generate on average 83 basic points (bps) stock daily excess returns, and negative news has an adverse impact of 26 bps on the days of news announcements, where both effects can last for a few days. This asymmetric effect aligns well with the short-sale constraints in the Chinese equity market. The result shows that the machine-learned prediction does provide sizeable predictive power with an annualized return of 54% at most with a simple investment strategy. Compared with other statistical and machine learning methods, FarmPredict significantly outperforms them on model prediction and portfolio performance. Our study demonstrates the far-reaching potential of using machines to learn text data.

Suggested Citation

  • Yang Zhou & Jianqing Fan & Lirong Xue, 2024. "How Much Can Machines Learn Finance from Chinese Text Data?," Management Science, INFORMS, vol. 70(12), pages 8962-8987, December.
  • Handle: RePEc:inm:ormnsc:v:70:y:2024:i:12:p:8962-8987
    DOI: 10.1287/mnsc.2022.01468
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2022.01468
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2022.01468?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:70:y:2024:i:12:p:8962-8987. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.