IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2207.13071.html
   My bibliography  Save this paper

Missing Values Handling for Machine Learning Portfolios

Author

Listed:
  • Andrew Y. Chen
  • Jack McCoy

Abstract

We characterize the structure and origins of missingness for 159 cross-sectional return predictors and study missing value handling for portfolios constructed using machine learning. Simply imputing with cross-sectional means performs well compared to rigorous expectation-maximization methods. This stems from three facts about predictor data: (1) missingness occurs in large blocks organized by time, (2) cross-sectional correlations are small, and (3) missingness tends to occur in blocks organized by the underlying data source. As a result, observed data provide little information about missing data. Sophisticated imputations introduce estimation noise that can lead to underperformance if machine learning is not carefully applied.

Suggested Citation

  • Andrew Y. Chen & Jack McCoy, 2022. "Missing Values Handling for Machine Learning Portfolios," Papers 2207.13071, arXiv.org, revised Jan 2024.
  • Handle: RePEc:arx:papers:2207.13071
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2207.13071
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Fama, Eugene F & French, Kenneth R, 1992. "The Cross-Section of Expected Stock Returns," Journal of Finance, American Finance Association, vol. 47(2), pages 427-465, June.
    2. Lewellen, Jonathan, 2015. "The Cross-section of Expected Stock Returns," Critical Finance Review, now publishers, vol. 4(1), pages 1-44, June.
    3. Lewandowski, Daniel & Kurowicka, Dorota & Joe, Harry, 2009. "Generating random correlation matrices based on vines and extended onion method," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 1989-2001, October.
    4. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," Review of Finance, European Finance Association, vol. 33(5), pages 2223-2273.
    5. Martin Lettau & Markus Pelger & Stijn Van Nieuwerburgh, 2020. "Factors That Fit the Time Series and Cross-Section of Stock Returns," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2274-2325.
    6. Kozak, Serhiy & Nagel, Stefan & Santosh, Shrihari, 2020. "Shrinking the cross-section," Journal of Financial Economics, Elsevier, vol. 135(2), pages 271-292.
    7. Joachim Freyberger & Andreas Neuhierl & Michael Weber & Andrew KarolyiEditor, 2020. "Dissecting Characteristics Nonparametrically," Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2326-2377.
    8. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    9. Jeremiah Green & John R. M. Hand & X. Frank Zhang, 2017. "The Characteristics that Provide Independent Information about Average U.S. Monthly Stock Returns," The Review of Financial Studies, Society for Financial Studies, vol. 30(12), pages 4389-4436.
    10. Martin Lettau & Markus Pelger, 2020. "Factors That Fit the Time Series and Cross-Section of Stock Returns," Review of Finance, European Finance Association, vol. 33(5), pages 2274-2325.
    11. Andrew Y. Chen & Tom Zimmermann, 2022. "Open Source Cross-Sectional Asset Pricing," Critical Finance Review, now publishers, vol. 11(2), pages 207-264, May.
    12. R. David Mclean & Jeffrey Pontiff, 2016. "Does Academic Research Destroy Stock Return Predictability?," Journal of Finance, American Finance Association, vol. 71(1), pages 5-32, February.
    13. Fama, Eugene F. & French, Kenneth R., 2015. "A five-factor asset pricing model," Journal of Financial Economics, Elsevier, vol. 116(1), pages 1-22.
    14. Titman, Sheridan & Wei, K. C. John & Xie, Feixue, 2004. "Capital Investments and Stock Returns," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 39(4), pages 677-700, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Daniel Homocianu, 2025. "Global Patterns of Parental Concerns About Children’s Education: Insights from WVS Data," Societies, MDPI, vol. 15(2), pages 1-47, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Andrew Y. & McCoy, Jack, 2024. "Missing values handling for machine learning portfolios," Journal of Financial Economics, Elsevier, vol. 155(C).
    2. Clarke, Charles, 2022. "The level, slope, and curve factor model for stocks," Journal of Financial Economics, Elsevier, vol. 143(1), pages 159-187.
    3. Baba-Yara, Fahiz & Boons, Martijn & Tamoni, Andrea, 2024. "Persistent and transitory components of firm characteristics: Implications for asset pricing," Journal of Financial Economics, Elsevier, vol. 154(C).
    4. Bryzgalova, Svetlana & Huang, Jiantao & Julliard, Christian, 2023. "Bayesian solutions for the factor zoo: we just ran two quadrillion models," LSE Research Online Documents on Economics 126151, London School of Economics and Political Science, LSE Library.
    5. Doron Avramov & Si Cheng & Lior Metzker, 2023. "Machine Learning vs. Economic Restrictions: Evidence from Stock Return Predictability," Management Science, INFORMS, vol. 69(5), pages 2587-2619, May.
    6. Lioui, Abraham & Tarelli, Andrea, 2022. "Chasing the ESG factor," Journal of Banking & Finance, Elsevier, vol. 139(C).
    7. Rubesam, Alexandre, 2022. "Machine learning portfolios with equal risk contributions: Evidence from the Brazilian market," Emerging Markets Review, Elsevier, vol. 51(PB).
    8. Cakici, Nusret & Fieberg, Christian & Metko, Daniel & Zaremba, Adam, 2023. "Machine learning goes global: Cross-sectional return predictability in international stock markets," Journal of Economic Dynamics and Control, Elsevier, vol. 155(C).
    9. Tobek, Ondrej & Hronec, Martin, 2021. "Does it pay to follow anomalies research? Machine learning approach with international evidence," Journal of Financial Markets, Elsevier, vol. 56(C).
    10. Bagnara, Matteo, 2024. "The economic value of cross-predictability: A performance-based measure," SAFE Working Paper Series 424, Leibniz Institute for Financial Research SAFE.
    11. van Binsbergen, Jules H. & Boons, Martijn & Opp, Christian C. & Tamoni, Andrea, 2023. "Dynamic asset (mis)pricing: Build-up versus resolution anomalies," Journal of Financial Economics, Elsevier, vol. 147(2), pages 406-431.
    12. Christian Fieberg & Daniel Metko & Thorsten Poddig & Thomas Loy, 2023. "Machine learning techniques for cross-sectional equity returns’ prediction," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(1), pages 289-323, March.
    13. Yan, Jingda & Yu, Jialin, 2023. "Cross-stock momentum and factor momentum," Journal of Financial Economics, Elsevier, vol. 150(2).
    14. Langlois, Hugues, 2023. "What matters in a characteristic?," Journal of Financial Economics, Elsevier, vol. 149(1), pages 52-72.
    15. Doron Avramov & Guy Kaplanski & Avanidhar Subrahmanyam, 2022. "Postfundamentals Price Drift in Capital Markets: A Regression Regularization Perspective," Management Science, INFORMS, vol. 68(10), pages 7658-7681, October.
    16. Sun, Chuanping, 2024. "Factor correlation and the cross section of asset returns: A correlation-robust machine learning approach," Journal of Empirical Finance, Elsevier, vol. 77(C).
    17. Ma, Tian & Leong, Wen Jun & Jiang, Fuwei, 2023. "A latent factor model for the Chinese stock market," International Review of Financial Analysis, Elsevier, vol. 87(C).
    18. Lin William Cong & Guanhao Feng & Jingyu He & Xin He, 2022. "Growing the Efficient Frontier on Panel Trees," NBER Working Papers 30805, National Bureau of Economic Research, Inc.
    19. Bang, Jeongseok & Kang, Yeonchan & Ryu, Doojin, 2024. "Potential pricing factors in the Korean market," Finance Research Letters, Elsevier, vol. 67(PB).
    20. Sak, Halis & Huang, Tao & Chng, Michael T., 2024. "Exploring the factor zoo with a machine-learning portfolio," International Review of Financial Analysis, Elsevier, vol. 96(PA).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2207.13071. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.