IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v278y2019i1p330-342.html
   My bibliography  Save this article

Large data sets and machine learning: Applications to statistical arbitrage

Author

Listed:
  • Huck, Nicolas

Abstract

Machine learning algorithms and big data are transforming all industries including the finance and portfolio management sectors. While these techniques, such as Deep Belief Networks or Random Forests, are becoming more and more popular on the market, the academic literature is relatively sparse. Through a series of applications involving hundreds of variables/predictors and stocks, this article presents some of the state-of-the-art techniques and how they can be implemented to manage a long-short portfolio. Numerous practical and empirical issues are developed. One of the main questions beyond big data use is the value of information. Does an increase in the number of predictors improve the portfolio performance? Which features are the most important? A large number of predictors means, potentially, a high level of noise. How do the algorithms manage this? This article develops an application using a 22-year trading period, up to 300 U.S. large caps and around 600 predictors. The empirical results underline the ability of these techniques to generate useful trading signals for portfolios with important turnovers and short holding periods (one or five days). Positive excess returns are reported between 1993 and 2008. They are strongly reduced after accounting for transaction costs and traditional risk factors. When these machine learning tools were readily available in the market, excess returns turned into the negative in most recent times. Results also show that adding features is far from being a guarantee to boost the alpha of the portfolio.

Suggested Citation

  • Huck, Nicolas, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," European Journal of Operational Research, Elsevier, vol. 278(1), pages 330-342.
  • Handle: RePEc:eee:ejores:v:278:y:2019:i:1:p:330-342
    DOI: 10.1016/j.ejor.2019.04.013
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221719303339
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2019.04.013?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Krauss, Christopher & Do, Xuan Anh & Huck, Nicolas, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," European Journal of Operational Research, Elsevier, vol. 259(2), pages 689-702.
    2. Michael C. Jensen, 1968. "The Performance Of Mutual Funds In The Period 1945–1964," Journal of Finance, American Finance Association, vol. 23(2), pages 389-416, May.
    3. Hong, Harrison & Torous, Walter & Valkanov, Rossen, 2007. "Do industries lead stock markets?," Journal of Financial Economics, Elsevier, vol. 83(2), pages 367-396, February.
    4. Fernandes, Marcelo & Medeiros, Marcelo C. & Scharth, Marcel, 2014. "Modeling and predicting the CBOE market volatility index," Journal of Banking & Finance, Elsevier, vol. 40(C), pages 1-10.
    5. Michael H. Breitner & Christian Dunis & Hans-Jörg Mettenheim & Christopher Neely & Georgios Sermpinis & Christian Spreckelsen & Hans‐Jörg Mettenheim & Michael H. Breitner, 2014. "Real‐Time Pricing and Hedging of Options on Currency Futures with Artificial Neural Networks," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 33(6), pages 419-432, September.
    6. Jacobs, Heiko, 2015. "What explains the dynamics of 100 anomalies?," Journal of Banking & Finance, Elsevier, vol. 57(C), pages 65-85.
    7. Victor DeMiguel & Lorenzo Garlappi & Raman Uppal, 2009. "Optimal Versus Naive Diversification: How Inefficient is the 1-N Portfolio Strategy?," The Review of Financial Studies, Society for Financial Studies, vol. 22(5), pages 1915-1953, May.
    8. Panopoulou, Ekaterini & Vrontos, Spyridon, 2015. "Hedge fund return predictability; To combine forecasts or combine information?," Journal of Banking & Finance, Elsevier, vol. 56(C), pages 103-122.
    9. Jushan Bai & Jianqing Fan & Ruey Tsay, 2016. "Special Issue on Big Data," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 487-488, October.
    10. Zhao, Yang & Li, Jianping & Yu, Lean, 2017. "A deep learning ensemble approach for crude oil price forecasting," Energy Economics, Elsevier, vol. 66(C), pages 9-16.
    11. Chordia, Tarun & Roll, Richard & Subrahmanyam, Avanidhar, 2011. "Recent trends in trading activity and market quality," Journal of Financial Economics, Elsevier, vol. 101(2), pages 243-263, August.
    12. Sebastian Krimm & Hendrik Scholz & Marco Wilkens, 2012. "The Sharpe ratio's market climate bias: Theoretical and empirical evidence from US equity mutual funds," Journal of Asset Management, Palgrave Macmillan, vol. 13(4), pages 227-242, August.
    13. Huang, Wanling & Mollick, André Varella & Nguyen, Khoa Huu, 2016. "U.S. stock markets and the role of real interest rates," The Quarterly Review of Economics and Finance, Elsevier, vol. 59(C), pages 231-242.
    14. Andrew W. Lo, 2010. "Hedge Funds: An Analytic Perspective Updated Edition," Economics Books, Princeton University Press, edition 1, number 9177.
    15. Chen, Zhiwu & Knez, Peter J, 1996. "Portfolio Performance Measurement: Theory and Applications," The Review of Financial Studies, Society for Financial Studies, vol. 9(2), pages 511-555.
    16. Sadka, Ronnie, 2010. "Liquidity risk and the cross-section of hedge-fund returns," Journal of Financial Economics, Elsevier, vol. 98(1), pages 54-71, October.
    17. Peter F. Christoffersen & Francis X. Diebold, 2006. "Financial Asset Returns, Direction-of-Change Forecasting, and Volatility Dynamics," Management Science, INFORMS, vol. 52(8), pages 1273-1287, August.
    18. Lutz Kilian & Cheolbeom Park, 2009. "The Impact Of Oil Price Shocks On The U.S. Stock Market," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 50(4), pages 1267-1287, November.
    19. Hendrik Scholz, 2007. "Refinements to the Sharpe ratio: Comparing alternatives for bear markets," Journal of Asset Management, Palgrave Macmillan, vol. 7(5), pages 347-357, January.
    20. Wei Bao & Jun Yue & Yulei Rao, 2017. "A deep learning framework for financial time series using stacked autoencoders and long-short term memory," PLOS ONE, Public Library of Science, vol. 12(7), pages 1-24, July.
    21. Ekaterini Panopoulou & Sotiria Plastira, 2014. "Fama French factors and US stock return predictability," Journal of Asset Management, Palgrave Macmillan, vol. 15(2), pages 110-128, April.
    22. Huck, Nicolas, 2009. "Pairs selection and outranking: An application to the S&P 100 index," European Journal of Operational Research, Elsevier, vol. 196(2), pages 819-825, July.
    23. Jeff Fleming & Chris Kirby & Barbara Ostdiek, 2001. "The Economic Value of Volatility Timing," Journal of Finance, American Finance Association, vol. 56(1), pages 329-352, February.
    24. Carhart, Mark M, 1997. "On Persistence in Mutual Fund Performance," Journal of Finance, American Finance Association, vol. 52(1), pages 57-82, March.
    25. Jegadeesh, Narasimhan, 1990. "Evidence of Predictable Behavior of Security Returns," Journal of Finance, American Finance Association, vol. 45(3), pages 881-898, July.
    26. Jonathan J.J.M. Seddon & Wendy L. Currie, 2017. "A model for unpacking big data analytics in high-frequency trading," Post-Print hal-01404316, HAL.
    27. Christopher Krauss & Anh Do & Nicolas Huck, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," Post-Print hal-01768895, HAL.
    28. Deren Caliskan & Mohammad Najand, 2016. "Stock market returns and the price of gold," Journal of Asset Management, Palgrave Macmillan, vol. 17(1), pages 10-21, January.
    29. Jones, Charles M & Kaul, Gautam, 1996. "Oil and the Stock Markets," Journal of Finance, American Finance Association, vol. 51(2), pages 463-491, June.
    30. N. Baba & Y. Sakurai, 2011. "Predicting regime switches in the VIX index with macroeconomic variables," Applied Economics Letters, Taylor & Francis Journals, vol. 18(15), pages 1415-1419.
    31. Ferson, Wayne E & Schadt, Rudi W, 1996. "Measuring Fund Strategy and Performance in Changing Economic Conditions," Journal of Finance, American Finance Association, vol. 51(2), pages 425-461, June.
    32. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    33. Seddon, Jonathan J.J.M. & Currie, Wendy L., 2017. "A model for unpacking big data analytics in high-frequency trading," Journal of Business Research, Elsevier, vol. 70(C), pages 300-307.
    34. Leung, Mark T. & Daouk, Hazem & Chen, An-Sing, 2000. "Forecasting stock indices: a comparison of classification and level estimation models," International Journal of Forecasting, Elsevier, vol. 16(2), pages 173-190.
    35. Jonathan Baron & Barbara A. Mellers & Philip E. Tetlock & Eric Stone & Lyle H. Ungar, 2014. "Two Reasons to Make Aggregated Probability Forecasts More Extreme," Decision Analysis, INFORMS, vol. 11(2), pages 133-145, June.
    36. Ville A. Satopää & Robin Pemantle & Lyle H. Ungar, 2016. "Modeling Probability Forecasts via Information Diversity," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1623-1633, October.
    37. François Longin & Bruno Solnik, 2001. "Extreme Correlation of International Equity Markets," Journal of Finance, American Finance Association, vol. 56(2), pages 649-676, April.
    38. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    39. Fama, Eugene F. & French, Kenneth R., 1993. "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, Elsevier, vol. 33(1), pages 3-56, February.
    40. Laopodis, Nikiforos T., 2013. "Monetary policy and stock market dynamics across monetary regimes," Journal of International Money and Finance, Elsevier, vol. 33(C), pages 381-406.
    41. Huck, Nicolas, 2010. "Pairs trading and outranking: The multi-step-ahead forecasting case," European Journal of Operational Research, Elsevier, vol. 207(3), pages 1702-1716, December.
    42. Esfandiar Maasoumi & Marcelo Medeiros, 2010. "The Link Between Statistical Learning Theory and Econometrics: Applications in Economics, Finance, and Marketing," Econometric Reviews, Taylor & Francis Journals, vol. 29(5-6), pages 470-475.
    43. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    44. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    45. Gibbons, Michael R & Hess, Patrick, 1981. "Day of the Week Effects and Asset Returns," The Journal of Business, University of Chicago Press, vol. 54(4), pages 579-596, October.
    46. Marco Avellaneda & Jeong-Hyun Lee, 2010. "Statistical arbitrage in the US equities market," Quantitative Finance, Taylor & Francis Journals, vol. 10(7), pages 761-782.
    47. Matt Taddy & Matt Gardner & Liyun Chen & David Draper, 2016. "A Nonparametric Bayesian Analysis of Heterogenous Treatment Effects in Digital Experimentation," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 661-672, October.
    48. David Silver & Aja Huang & Chris J. Maddison & Arthur Guez & Laurent Sifre & George van den Driessche & Julian Schrittwieser & Ioannis Antonoglou & Veda Panneershelvam & Marc Lanctot & Sander Dieleman, 2016. "Mastering the game of Go with deep neural networks and tree search," Nature, Nature, vol. 529(7587), pages 484-489, January.
    49. Angela J. Black & Olga Klinkowska & David G. McMillan & Fiona J. McMillan, 2014. "Forecasting Stock Returns: Do Commodity Prices Help?," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 33(8), pages 627-639, December.
    50. David Silver & Julian Schrittwieser & Karen Simonyan & Ioannis Antonoglou & Aja Huang & Arthur Guez & Thomas Hubert & Lucas Baker & Matthew Lai & Adrian Bolton & Yutian Chen & Timothy Lillicrap & Fan , 2017. "Mastering the game of Go without human knowledge," Nature, Nature, vol. 550(7676), pages 354-359, October.
    51. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    52. Ariel, Robert A, 1990. "High Stock Returns before Holidays: Existence and Evidence on Possible Causes," Journal of Finance, American Finance Association, vol. 45(5), pages 1611-1626, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Krauss, Christopher & Do, Xuan Anh & Huck, Nicolas, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," European Journal of Operational Research, Elsevier, vol. 259(2), pages 689-702.
    2. Flori, Andrea & Regoli, Daniele, 2021. "Revealing Pairs-trading opportunities with long short-term memory networks," European Journal of Operational Research, Elsevier, vol. 295(2), pages 772-791.
    3. Rubesam, Alexandre, 2022. "Machine learning portfolios with equal risk contributions: Evidence from the Brazilian market," Emerging Markets Review, Elsevier, vol. 51(PB).
    4. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    5. Fischer, Thomas & Krauss, Christopher, 2017. "Deep learning with long short-term memory networks for financial market predictions," FAU Discussion Papers in Economics 11/2017, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    6. Stadtmüller, Immo & Auer, Benjamin R. & Schuhmacher, Frank, 2022. "On the benefits of active stock selection strategies for diversified investors," The Quarterly Review of Economics and Finance, Elsevier, vol. 85(C), pages 342-354.
    7. Thomas Günter Fischer & Christopher Krauss & Alexander Deinert, 2019. "Statistical Arbitrage in Cryptocurrency Markets," JRFM, MDPI, vol. 12(1), pages 1-15, February.
    8. Adam Zaremba & Jacob Koby Shemer, 2018. "Price-Based Investment Strategies," Springer Books, Springer, number 978-3-319-91530-2, December.
    9. Kim, A. & Yang, Y. & Lessmann, S. & Ma, T. & Sung, M.-C. & Johnson, J.E.V., 2020. "Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting," European Journal of Operational Research, Elsevier, vol. 283(1), pages 217-234.
    10. Pushpendu Ghosh & Ariel Neufeld & Jajati Keshari Sahoo, 2020. "Forecasting directional movements of stock prices for intraday trading using LSTM and random forests," Papers 2004.10178, arXiv.org, revised Jun 2021.
    11. Lian, Ziying & Cai, Jun & Webb, Robert I., 2020. "Oil stocks, risk factors, and tail behavior," Energy Economics, Elsevier, vol. 91(C).
    12. Omer Berat Sezer & Mehmet Ugur Gudelek & Ahmet Murat Ozbayoglu, 2019. "Financial Time Series Forecasting with Deep Learning : A Systematic Literature Review: 2005-2019," Papers 1911.13288, arXiv.org.
    13. Kolesnikova, A. & Yang, Y. & Lessmann, S. & Ma, T. & Sung, M.-C. & Johnson, J.E.V., 2019. "Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk Behavior Forecasting," IRTG 1792 Discussion Papers 2019-023, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    14. Han, Chulwoo & He, Zhaodong & Toh, Alenson Jun Wei, 2023. "Pairs trading via unsupervised learning," European Journal of Operational Research, Elsevier, vol. 307(2), pages 929-947.
    15. Allen, David & Lizieri, Colin & Satchell, Stephen, 2020. "A comparison of non-Gaussian VaR estimation and portfolio construction techniques," Journal of Empirical Finance, Elsevier, vol. 58(C), pages 356-368.
    16. Ghosh, Pushpendu & Neufeld, Ariel & Sahoo, Jajati Keshari, 2022. "Forecasting directional movements of stock prices for intraday trading using LSTM and random forests," Finance Research Letters, Elsevier, vol. 46(PA).
    17. Cakici, Nusret & Zaremba, Adam, 2022. "Salience theory and the cross-section of stock returns: International and further evidence," Journal of Financial Economics, Elsevier, vol. 146(2), pages 689-725.
    18. Kingsley Fong & David R. Gallagher & Adrian D. Lee, 2008. "Benchmarking benchmarks: measuring characteristic selectivity using portfolio holdings data," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 48(5), pages 761-781, December.
    19. Lars Hornuf & Gül Yüksel, 2022. "The Performance of Socially Responsible Investments: A Meta-Analysis," CESifo Working Paper Series 9724, CESifo.
    20. Dong‐Hyun Ahn & H. Henry Cao & Stéphane Chrétien, 2009. "Portfolio Performance Measurement: a No Arbitrage Bounds Approach," European Financial Management, European Financial Management Association, vol. 15(2), pages 298-339, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:278:y:2019:i:1:p:330-342. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.