IDEAS home Printed from https://ideas.repec.org/a/eee/dyncon/v114y2020ics0165188920300634.html
   My bibliography  Save this article

Separating the signal from the noise – Financial machine learning for Twitter

Author

Listed:
  • Schnaubelt, Matthias
  • Fischer, Thomas G.
  • Krauss, Christopher

Abstract

Most statistical arbitrage strategies in the academic literature solely rely on price time series. By contrast, alternative data sources are of growing importance for professional investors. We contribute to bridging this gap by assessing the price-predictive value of millions of tweets on intraday returns of the S&P 500 constituents from 2014 and 2015. For this purpose, we design a machine learning system addressing specific challenges inherent to this task. At first, building on the literature of financial dictionaries, we engineer domain-specific features along three categories, i.e., directional indicators, relevance indicators and meta features. Next, we leverage a random forest to extract the relationship between these features and subsequent stock returns in a low signal-to-noise setting. For performance evaluation, we run a rigorous event-based backtesting study across all tweets and stocks. We find annualized returns of 6.4 percent and a Sharpe ratio of 2.2 after transaction costs. Finally, we illuminate the machine learning black box and unveil sources of profitability: First, results are both driven and limited by the temporal clustering of tweets, i.e., the majority of profits stem from tweets clustered closely together in time, corresponding to high-event situations. Second, the importance of included features follows an economic rationale, e.g., tweets with positive sentiment tend to yield positive returns and vice versa. Third, we find that stocks of medium market capitalization and from the consumer and technology sectors contribute most to our results, which we interpret as a trade-off between tweet coverage and tweet relevance.

Suggested Citation

  • Schnaubelt, Matthias & Fischer, Thomas G. & Krauss, Christopher, 2020. "Separating the signal from the noise – Financial machine learning for Twitter," Journal of Economic Dynamics and Control, Elsevier, vol. 114(C).
  • Handle: RePEc:eee:dyncon:v:114:y:2020:i:c:s0165188920300634
    DOI: 10.1016/j.jedc.2020.103895
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0165188920300634
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jedc.2020.103895?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Teresa L. Ju & Yao Chin Lin & Nhu-Hang Ha, 2014. "Proactive Assessment for Collaboration Success," SAGE Open, , vol. 4(3), pages 21582440145, July.
    2. Krauss, Christopher & Do, Xuan Anh & Huck, Nicolas, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," European Journal of Operational Research, Elsevier, vol. 259(2), pages 689-702.
    3. Paul C. Tetlock & Maytal Saar‐Tsechansky & Sofus Macskassy, 2008. "More Than Words: Quantifying Language to Measure Firms' Fundamentals," Journal of Finance, American Finance Association, vol. 63(3), pages 1437-1467, June.
    4. Matthew Gentzkow & Bryan Kelly & Matt Taddy, 2019. "Text as Data," Journal of Economic Literature, American Economic Association, vol. 57(3), pages 535-574, September.
    5. Leung, Mark T. & Daouk, Hazem & Chen, An-Sing, 2000. "Forecasting stock indices: a comparison of classification and level estimation models," International Journal of Forecasting, Elsevier, vol. 16(2), pages 173-190.
    6. Sanjiv R. Das & Mike Y. Chen, 2007. "Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web," Management Science, INFORMS, vol. 53(9), pages 1375-1388, September.
    7. Xu, Wei & Chen, Yuehuan & Coleman, Conrad & Coleman, Thomas F., 2018. "Moment matching machine learning methods for risk management of large variable annuity portfolios," Journal of Economic Dynamics and Control, Elsevier, vol. 87(C), pages 1-20.
    8. Evan Gatev & William N. Goetzmann & K. Geert Rouwenhorst, 2006. "Pairs Trading: Performance of a Relative-Value Arbitrage Rule," The Review of Financial Studies, Society for Financial Studies, vol. 19(3), pages 797-827.
    9. Huck, Nicolas, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," European Journal of Operational Research, Elsevier, vol. 278(1), pages 330-342.
    10. Zheng Tracy Ke & Bryan T. Kelly & Dacheng Xiu, 2019. "Predicting Returns With Text Data," NBER Working Papers 26186, National Bureau of Economic Research, Inc.
    11. Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
    12. Timm O. Sprenger & Philipp G. Sandner & Andranik Tumasjan & Isabell M. Welpe, 2014. "News or Noise? Using Twitter to Identify and Understand Company-specific News Flow," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 41(7-8), pages 791-830, September.
    13. Clifford S. Asness & Tobias J. Moskowitz & Lasse Heje Pedersen, 2013. "Value and Momentum Everywhere," Journal of Finance, American Finance Association, vol. 68(3), pages 929-985, June.
    14. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    15. Thomas Günter Fischer & Christopher Krauss & Alexander Deinert, 2019. "Statistical Arbitrage in Cryptocurrency Markets," JRFM, MDPI, vol. 12(1), pages 1-15, February.
    16. Nicolas Huck, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," Post-Print hal-02143971, HAL.
    17. Bekiros, Stelios D., 2010. "Heterogeneous trading strategies with adaptive fuzzy Actor-Critic reinforcement learning: A behavioral approach," Journal of Economic Dynamics and Control, Elsevier, vol. 34(6), pages 1153-1170, June.
    18. Fama, Eugene F, 1970. "Efficient Capital Markets: A Review of Theory and Empirical Work," Journal of Finance, American Finance Association, vol. 25(2), pages 383-417, May.
    19. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    20. Marco Avellaneda & Jeong-Hyun Lee, 2010. "Statistical arbitrage in the US equities market," Quantitative Finance, Taylor & Francis Journals, vol. 10(7), pages 761-782.
    21. Christopher Krauss & Anh Do & Nicolas Huck, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," Post-Print hal-01768895, HAL.
    22. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    23. Allen H. Huang & Reuven Lehavy & Amy Y. Zang & Rong Zheng, 2018. "Analyst Information Discovery and Interpretation Roles: A Topic Modeling Approach," Management Science, INFORMS, vol. 64(6), pages 2833-2855, June.
    24. Jegadeesh, Narasimhan & Livnat, Joshua, 2006. "Revenue surprises and stock returns," Journal of Accounting and Economics, Elsevier, vol. 41(1-2), pages 147-171, April.
    25. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Frank, Johannes, 2023. "Forecasting realized volatility in turbulent times using temporal fusion transformers," FAU Discussion Papers in Economics 03/2023, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    2. Schnaubelt, Matthias, 2022. "Deep reinforcement learning for the optimal placement of cryptocurrency limit orders," European Journal of Operational Research, Elsevier, vol. 296(3), pages 993-1006.
    3. Thomas Dierckx & Jesse Davis & Wim Schoutens, 2022. "Nowcasting Stock Implied Volatility with Twitter," Papers 2301.00248, arXiv.org.
    4. Xiaohong Shen & Gaoshan Wang & Yue Wang & Alfred Peris, 2021. "The Influence of Research Reports on Stock Returns: The Mediating Effect of Machine-Learning-Based Investor Sentiment," Discrete Dynamics in Nature and Society, Hindawi, vol. 2021, pages 1-14, December.
    5. Schnaubelt, Matthias & Seifert, Oleg, 2020. "Valuation ratios, surprises, uncertainty or sentiment: How does financial machine learning predict returns from earnings announcements?," FAU Discussion Papers in Economics 04/2020, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    6. Herrera, Gabriel Paes & Constantino, Michel & Su, Jen-Je & Naranpanawa, Athula, 2022. "Renewable energy stocks forecast using Twitter investor sentiment and deep learning," Energy Economics, Elsevier, vol. 114(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Schnaubelt, Matthias & Fischer, Thomas G. & Krauss, Christopher, 2018. "Separating the signal from the noise - financial machine learning for Twitter," FAU Discussion Papers in Economics 14/2018, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    2. Schnaubelt, Matthias & Seifert, Oleg, 2020. "Valuation ratios, surprises, uncertainty or sentiment: How does financial machine learning predict returns from earnings announcements?," FAU Discussion Papers in Economics 04/2020, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    3. Flori, Andrea & Regoli, Daniele, 2021. "Revealing Pairs-trading opportunities with long short-term memory networks," European Journal of Operational Research, Elsevier, vol. 295(2), pages 772-791.
    4. Fabian Waldow & Matthias Schnaubelt & Christopher Krauss & Thomas Günter Fischer, 2021. "Machine Learning in Futures Markets," JRFM, MDPI, vol. 14(3), pages 1-14, March.
    5. Han, Chulwoo & He, Zhaodong & Toh, Alenson Jun Wei, 2023. "Pairs trading via unsupervised learning," European Journal of Operational Research, Elsevier, vol. 307(2), pages 929-947.
    6. Alexander Jakob Dautel & Wolfgang Karl Härdle & Stefan Lessmann & Hsin-Vonn Seow, 2020. "Forex exchange rate forecasting using deep recurrent neural networks," Digital Finance, Springer, vol. 2(1), pages 69-96, September.
    7. Kriebel, Johannes & Stitz, Lennart, 2022. "Credit default prediction from user-generated text in peer-to-peer lending using deep learning," European Journal of Operational Research, Elsevier, vol. 302(1), pages 309-323.
    8. Rubesam, Alexandre, 2022. "Machine learning portfolios with equal risk contributions: Evidence from the Brazilian market," Emerging Markets Review, Elsevier, vol. 51(PB).
    9. Kasper Johansson & Thomas Schmelzer & Stephen Boyd, 2024. "Finding Moving-Band Statistical Arbitrages via Convex-Concave Optimization," Papers 2402.08108, arXiv.org.
    10. Erdinc Akyildirim & Ahmet Goncu & Alper Hekimoglu & Duc Khuong Nguyen & Ahmet Sensoy, 2023. "Statistical arbitrage: factor investing approach," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(4), pages 1295-1331, December.
    11. Charles W. Calomiris & Nida Çakır Melek & Harry Mamaysky, 2021. "Predicting the Oil Market," NBER Working Papers 29379, National Bureau of Economic Research, Inc.
    12. Thomas Günter Fischer & Christopher Krauss & Alexander Deinert, 2019. "Statistical Arbitrage in Cryptocurrency Markets," JRFM, MDPI, vol. 12(1), pages 1-15, February.
    13. Baoqiang Zhan & Shu Zhang & Helen S. Du & Xiaoguang Yang, 2022. "Exploring Statistical Arbitrage Opportunities Using Machine Learning Strategy," Computational Economics, Springer;Society for Computational Economics, vol. 60(3), pages 861-882, October.
    14. Kamaladdin Fataliyev & Aneesh Chivukula & Mukesh Prasad & Wei Liu, 2021. "Stock Market Analysis with Text Data: A Review," Papers 2106.12985, arXiv.org, revised Jul 2021.
    15. Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Digital Finance, Springer, vol. 2(1), pages 1-13, September.
    16. Fischer, Thomas & Krauss, Christopher, 2017. "Deep learning with long short-term memory networks for financial market predictions," FAU Discussion Papers in Economics 11/2017, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    17. Goodell, John W. & Kumar, Satish & Lim, Weng Marc & Pattnaik, Debidutta, 2021. "Artificial intelligence and machine learning in finance: Identifying foundations, themes, and research clusters from bibliometric analysis," Journal of Behavioral and Experimental Finance, Elsevier, vol. 32(C).
    18. Mao, Huina & Counts, Scott & Bollen, Johan, 2015. "Quantifying the effects of online bullishness on international financial markets," Statistics Paper Series 09, European Central Bank.
    19. Kim, A. & Yang, Y. & Lessmann, S. & Ma, T. & Sung, M.-C. & Johnson, J.E.V., 2020. "Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting," European Journal of Operational Research, Elsevier, vol. 283(1), pages 217-234.
    20. Pedro M. Mirete-Ferrer & Alberto Garcia-Garcia & Juan Samuel Baixauli-Soler & Maria A. Prats, 2022. "A Review on Machine Learning for Asset Management," Risks, MDPI, vol. 10(4), pages 1-46, April.

    More about this item

    Keywords

    Finance; Statistical arbitrage; Machine learning; Natural language processing;
    All these keywords.

    JEL classification:

    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • G11 - Financial Economics - - General Financial Markets - - - Portfolio Choice; Investment Decisions
    • G14 - Financial Economics - - General Financial Markets - - - Information and Market Efficiency; Event Studies; Insider Trading
    • G17 - Financial Economics - - General Financial Markets - - - Financial Forecasting and Simulation

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:dyncon:v:114:y:2020:i:c:s0165188920300634. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/jedc .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.