IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2010.01265.html
   My bibliography  Save this paper

DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis

Author

Listed:
  • Chuheng Zhang
  • Yuanqi Li
  • Xi Chen
  • Yifei Jin
  • Pingzhong Tang
  • Jian Li

Abstract

Modern machine learning models (such as deep neural networks and boosting decision tree models) have become increasingly popular in financial market prediction, due to their superior capacity to extract complex non-linear patterns. However, since financial datasets have very low signal-to-noise ratio and are non-stationary, complex models are often very prone to overfitting and suffer from instability issues. Moreover, as various machine learning and data mining tools become more widely used in quantitative trading, many trading firms have been producing an increasing number of features (aka factors). Therefore, how to automatically select effective features becomes an imminent problem. To address these issues, we propose DoubleEnsemble, an ensemble framework leveraging learning trajectory based sample reweighting and shuffling based feature selection. Specifically, we identify the key samples based on the training dynamics on each sample and elicit key features based on the ablation impact of each feature via shuffling. Our model is applicable to a wide range of base models, capable of extracting complex patterns, while mitigating the overfitting and instability issues for financial market prediction. We conduct extensive experiments, including price prediction for cryptocurrencies and stock trading, using both DNN and gradient boosting decision tree as base models. Our experiment results demonstrate that DoubleEnsemble achieves a superior performance compared with several baseline methods.

Suggested Citation

  • Chuheng Zhang & Yuanqi Li & Xi Chen & Yifei Jin & Pingzhong Tang & Jian Li, 2020. "DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis," Papers 2010.01265, arXiv.org, revised Jan 2021.
  • Handle: RePEc:arx:papers:2010.01265
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2010.01265
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Stephen A. Ross, 2013. "The Arbitrage Theory of Capital Asset Pricing," World Scientific Book Chapters, in: Leonard C MacLean & William T Ziemba (ed.), HANDBOOK OF THE FUNDAMENTALS OF FINANCIAL DECISION MAKING Part I, chapter 1, pages 11-30, World Scientific Publishing Co. Pte. Ltd..
    2. Chan, Louis K C & Hamao, Yasushi & Lakonishok, Josef, 1991. "Fundamentals and Stock Returns in Japan," Journal of Finance, American Finance Association, vol. 46(5), pages 1739-1764, December.
    3. Rama Cont & Arseniy Kukanov & Sasha Stoikov, 2014. "The Price Impact of Order Book Events," Journal of Financial Econometrics, Oxford University Press, vol. 12(1), pages 47-88.
    4. Tianping Zhang & Yuanqi Li & Yifei Jin & Jian Li, 2020. "AutoAlpha: an Efficient Hierarchical Evolutionary Algorithm for Mining Alpha Factors in Quantitative Investment," Papers 2002.08245, arXiv.org, revised Apr 2020.
    5. Basu, Sanjoy, 1983. "The relationship between earnings' yield, market value and return for NYSE common stocks : Further evidence," Journal of Financial Economics, Elsevier, vol. 12(1), pages 129-156, June.
    6. repec:bla:jfinan:v:43:y:1988:i:2:p:507-28 is not listed on IDEAS
    7. Banz, Rolf W., 1981. "The relationship between return and market value of common stocks," Journal of Financial Economics, Elsevier, vol. 9(1), pages 3-18, March.
    8. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael E. Drew & Madhu Veeraraghavan, 2000. "Multifactor Models are Alive and Well," School of Economics and Finance Discussion Papers and Working Papers Series 083, School of Economics and Finance, Queensland University of Technology.
    2. Lam, Keith S. K., 2002. "The relationship between size, book-to-market equity ratio, earnings-price ratio, and return for the Hong Kong stock market," Global Finance Journal, Elsevier, vol. 13(2), pages 163-179.
    3. ALAM Nafis & TAN Ee Chain, 2012. "Impact Of Financial Crisis On Stock Returns: Evidence From Singapore," Studies in Business and Economics, Lucian Blaga University of Sibiu, Faculty of Economic Sciences, vol. 7(2), pages 5-19, August.
    4. Don U.A. Galagedera, 2004. "A survey on risk-return analysis," Finance 0406010, University Library of Munich, Germany.
    5. repec:bla:jfinan:v:58:y:2003:i:5:p:1969-1996 is not listed on IDEAS
    6. Fernando Rubio, 2005. "Eficiencia De Mercado, Administracion De Carteras De Fondos Y Behavioural Finance," Finance 0503028, University Library of Munich, Germany, revised 23 Jul 2005.
    7. Phan Tran Minh Hung & Tran Thi Trang Dai & Phan Nguyen Bao Quynh & Le Duc Toan & Vo Hoang Diem Trinh, 2019. "The Relationship between Risk and Return - An Empirical Evidence from Real Estate Stocks Listed in Vietnam," Asian Economic and Financial Review, Asian Economic and Social Society, vol. 9(11), pages 1211-1226, November.
    8. M. Eskandar Shah & Sourafel Girm & R. Hudson, 2012. "Rationalizing the Value Premium under Economic Fundamentals in an Emerging Market," Working Papers 12010, Bangor Business School, Prifysgol Bangor University (Cymru / Wales).
    9. Misund, Bard & Mohn, Klaus, 2014. "Exploration Risk in Oil & Gas Shareholder Returns," UiS Working Papers in Economics and Finance 2014/4, University of Stavanger.
    10. Mehnaz Roushan Laura & Nafiz Ul Fahad, 2017. "The Classical Approaches to Testing the Unconditional CAPM: UK Evidence," International Journal of Economics and Finance, Canadian Center of Science and Education, vol. 9(3), pages 220-232, March.
    11. Massimo Guidolin & Manuela Pedio, 2019. "How Smart is the Real Estate Smart Beta? Evidence from Optimal Style Factor Strategies for REITs," BAFFI CAREFIN Working Papers 19117, BAFFI CAREFIN, Centre for Applied Research on International Markets Banking Finance and Regulation, Universita' Bocconi, Milano, Italy.
    12. Schnaubelt, Matthias & Seifert, Oleg, 2020. "Valuation ratios, surprises, uncertainty or sentiment: How does financial machine learning predict returns from earnings announcements?," FAU Discussion Papers in Economics 04/2020, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    13. Eduardo Sandoval & Rodrigo Saens, 2004. "The Conditional Relationship Between Portfolio Beta and Return: Evidence from Latin America," Latin American Journal of Economics-formerly Cuadernos de Economía, Instituto de Economía. Pontificia Universidad Católica de Chile., vol. 41(122), pages 65-89.
    14. Eero Pätäri & Timo Leivo, 2017. "A Closer Look At Value Premium: Literature Review And Synthesis," Journal of Economic Surveys, Wiley Blackwell, vol. 31(1), pages 79-168, February.
    15. repec:dau:papers:123456789/2514 is not listed on IDEAS
    16. D. L. Wilcox & T. J. Gebbie, 2013. "On pricing kernels, information and risk," Papers 1310.4067, arXiv.org, revised Oct 2013.
    17. Gikas Hardouvelis & George Papanastasopoulos & Dimitrios D. Thomakos & Tao Wang, 2007. "Accruals, Net Stock Issues and Value-Glamour Anomalies: New Evidence on their Relation," Working Paper series 47_07, Rimini Centre for Economic Analysis.
    18. Adam Zaremba & Jacob Koby Shemer, 2018. "Price-Based Investment Strategies," Springer Books, Springer, number 978-3-319-91530-2, October.
    19. Keith Lam & Frank Li, 2008. "The risk premiums of the four-factor asset pricing model in the Hong Kong stock market," Applied Financial Economics, Taylor & Francis Journals, vol. 18(20), pages 1667-1680.
    20. Geertsema, Paul & Lu, Helen, 2020. "The correlation structure of anomaly strategies," Journal of Banking & Finance, Elsevier, vol. 119(C).
    21. Kent Daniel & Sheridan Titman & K.C. John Wei, 2001. "Explaining the Cross‐Section of Stock Returns in Japan: Factors or Characteristics?," Journal of Finance, American Finance Association, vol. 56(2), pages 743-766, April.
    22. Fernando Rubio, 2005. "Estrategias Cuantitativas De Valor Y Retornos Por Accion De Largo," Finance 0503029, University Library of Munich, Germany.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2010.01265. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.