IDEAS home Printed from https://ideas.repec.org/p/fip/fedgfe/2022-42.html
   My bibliography  Save this paper

Integrating Prediction and Attribution to Classify News

Author

Listed:

Abstract

Recent modeling developments have created tradeoffs between attribution-based models, models that rely on causal relationships, and “pure prediction models†such as neural networks. While forecasters have historically favored one technology or the other based on comfort or loyalty to a particular paradigm, in domains with many observations and predictors such as textual analysis, the tradeoffs between attribution and prediction have become too large to ignore. We document these tradeoffs in the context of relabeling 27 million Thomson Reuters news articles published between 1996 and 2021 as debt-related or non-debt related. Articles in our dataset were labeled by journalists at the time of publication, but these labels may be inconsistent as labeling standards and the relation between text and label has changed over time. We propose a method for identifying and correcting inconsistent labeling that combines attribution and pure prediction methods and is applicable to any domain with human-labeled data. Implementing our proposed labeling solution returns a debt-related news dataset with 54% more observations than if the original journalist labels had been used and 31% more observation than if our solution had been implemented using attribution-based methods only.

Suggested Citation

  • Nelson P. Rayl & Nitish R. Sinha, 2022. "Integrating Prediction and Attribution to Classify News," Finance and Economics Discussion Series 2022-042, Board of Governors of the Federal Reserve System (U.S.).
  • Handle: RePEc:fip:fedgfe:2022-42
    DOI: 10.17016/FEDS.2022.042
    as

    Download full text from publisher

    File URL: https://www.federalreserve.gov/econres/feds/files/2022042pap.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.17016/FEDS.2022.042?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stephen Hansen & Michael McMahon & Andrea Prat, 2018. "Transparency and Deliberation Within the FOMC: A Computational Linguistics Approach," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(2), pages 801-870.
    2. Bradley Efron, 2020. "Prediction, Estimation, and Attribution," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 636-655, April.
    3. Scott R. Baker & Nicholas Bloom & Steven J. Davis, 2016. "Measuring Economic Policy Uncertainty," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 131(4), pages 1593-1636.
    4. Matt Taddy, 2013. "Multinomial Inverse Regression for Text Analysis," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(503), pages 755-770, September.
    5. Bradley Efron, 2020. "Prediction, Estimation, and Attribution," International Statistical Review, International Statistical Institute, vol. 88(S1), pages 28-59, December.
    6. Laver, Michael & Benoit, Kenneth & Garry, John, 2003. "Extracting Policy Positions from Political Texts Using Words as Data," American Political Science Review, Cambridge University Press, vol. 97(2), pages 311-331, May.
    7. Jake M. Hofman & Duncan J. Watts & Susan Athey & Filiz Garip & Thomas L. Griffiths & Jon Kleinberg & Helen Margetts & Sendhil Mullainathan & Matthew J. Salganik & Simine Vazire & Alessandro Vespignani, 2021. "Integrating explanation and prediction in computational social science," Nature, Nature, vol. 595(7866), pages 181-188, July.
    8. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Niţoi, Mihai & Pochea, Maria-Miruna & Radu, Ştefan-Constantin, 2023. "Unveiling the sentiment behind central bank narratives: A novel deep learning index," Journal of Behavioral and Experimental Finance, Elsevier, vol. 38(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
    2. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    3. Kirtac, Kemal & Germano, Guido, 2024. "Sentiment trading with large language models," Finance Research Letters, Elsevier, vol. 62(PB).
    4. Martin Baumgaertner & Johannes Zahner, 2021. "Whatever it takes to understand a central banker - Embedding their words using neural networks," MAGKS Papers on Economics 202130, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    5. Hanjo Odendaal & Monique Reid & Johann F. Kirsten, 2020. "Media‐Based Sentiment Indices as an Alternative Measure of Consumer Confidence," South African Journal of Economics, Economic Society of South Africa, vol. 88(4), pages 409-434, December.
    6. Istrefi, Klodiana & Odendahl, Florens & Sestieri, Giulia, 2023. "Fed communication on financial stability concerns and monetary policy decisions: Revelations from speeches," Journal of Banking & Finance, Elsevier, vol. 151(C).
    7. Massimo Ferrari Minesso & Frederik Kurcz & Maria Sole Pagliari, 2022. "Do words hurt more than actions? The impact of trade tensions on financial markets," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1138-1159, September.
    8. Picault, Matthieu & Renault, Thomas, 2017. "Words are not all created equal: A new measure of ECB communication," Journal of International Money and Finance, Elsevier, vol. 79(C), pages 136-156.
    9. Shapiro, Adam Hale & Sudhof, Moritz & Wilson, Daniel J., 2022. "Measuring news sentiment," Journal of Econometrics, Elsevier, vol. 228(2), pages 221-243.
    10. Vegard H ghaug Larsen & Leif Anders Thorsrud, 2018. "Business cycle narratives," Working Papers No 6/2018, Centre for Applied Macro- and Petroleum economics (CAMP), BI Norwegian Business School.
    11. Hubert, Paul & Labondance, Fabien, 2021. "The signaling effects of central bank tone," European Economic Review, Elsevier, vol. 133(C).
    12. Matteo Cinelli & Valerio Ficcadenti & Jessica Riccioni, 2020. "The interconnectedness of the economic content in the speeches of the US Presidents," Papers 2002.07880, arXiv.org.
    13. Hansen, Stephen & Davis, Steven & Seminario-Amez, Cristhian, 2020. "Firm-level Risk Exposures and Stock Returns in the Wake of COVID-19," CEPR Discussion Papers 15314, C.E.P.R. Discussion Papers.
    14. Ashwin,Julian & Rao,Vijayendra & Biradavolu,Monica Rao & Chhabra,Aditya & Haque,Arshia & Khan,Afsana Iffat & Krishnan,Nandini, 2022. "A Method to Scale-Up Interpretative Qualitative Analysis, with an Application toAspirations in Cox’s Bazaar, Bangladesh," Policy Research Working Paper Series 10046, The World Bank.
    15. Nyman, Rickard & Kapadia, Sujit & Tuckett, David, 2021. "News and narratives in financial systems: Exploiting big data for systemic risk assessment," Journal of Economic Dynamics and Control, Elsevier, vol. 127(C).
    16. Joaquin Iglesias & Alvaro Ortiz & Tomasa Rodrigo, 2017. "How do the EM Central Bank talk? A Big Data approach to the Central Bank of Turkey," Working Papers 17/24, BBVA Bank, Economic Research Department.
    17. Pierre L. Siklos, 2020. "U.S. Monetary Policy since the 1950s and the Changing Content of FOMC Minutes," Southern Economic Journal, John Wiley & Sons, vol. 86(3), pages 1192-1213, January.
    18. Lin, Jianhao & Mei, Ziwei & Chen, Liangyuan & Zhu, Chuanqi, 2023. "Is the People's Bank of China consistent in words and deeds?," China Economic Review, Elsevier, vol. 78(C).
    19. Matteo Cinelli & Valerio Ficcadenti & Jessica Riccioni, 2021. "The interconnectedness of the economic content in the speeches of the US Presidents," Annals of Operations Research, Springer, vol. 299(1), pages 593-615, April.
    20. Ilias Filippou & James Mitchell & My T. Nguyen, 2023. "The FOMC versus the Staff: Do Policymakers Add Value in Their Tales?," Working Papers 23-20, Federal Reserve Bank of Cleveland.

    More about this item

    Keywords

    News; Text Analysis; Debt; Labeling; Supervised Learning; DMR;
    All these keywords.

    JEL classification:

    • C40 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - General
    • C45 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Neural Networks and Related Topics
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:fip:fedgfe:2022-42. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Ryan Wolfslayer ; Keisha Fournillier (email available below). General contact details of provider: https://edirc.repec.org/data/frbgvus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.