IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2207.13071.html
   My bibliography  Save this paper

Missing Values Handling for Machine Learning Portfolios

Author

Listed:
  • Andrew Y. Chen
  • Jack McCoy

Abstract

We characterize the structure and origins of missingness for 159 cross-sectional return predictors and study missing value handling for portfolios constructed using machine learning. Simply imputing with cross-sectional means performs well compared to rigorous expectation-maximization methods. This stems from three facts about predictor data: (1) missingness occurs in large blocks organized by time, (2) cross-sectional correlations are small, and (3) missingness tends to occur in blocks organized by the underlying data source. As a result, observed data provide little information about missing data. Sophisticated imputations introduce estimation noise that can lead to underperformance if machine learning is not carefully applied.

Suggested Citation

  • Andrew Y. Chen & Jack McCoy, 2022. "Missing Values Handling for Machine Learning Portfolios," Papers 2207.13071, arXiv.org, revised Jan 2024.
  • Handle: RePEc:arx:papers:2207.13071
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2207.13071
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Fama, Eugene F & French, Kenneth R, 1992. "The Cross-Section of Expected Stock Returns," Journal of Finance, American Finance Association, vol. 47(2), pages 427-465, June.
    2. Lewellen, Jonathan, 2015. "The Cross-section of Expected Stock Returns," Critical Finance Review, now publishers, vol. 4(1), pages 1-44, June.
    3. Lewandowski, Daniel & Kurowicka, Dorota & Joe, Harry, 2009. "Generating random correlation matrices based on vines and extended onion method," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 1989-2001, October.
    4. Andrew Y. Chen & Tom Zimmermann, 2022. "Open Source Cross-Sectional Asset Pricing," Critical Finance Review, now publishers, vol. 11(2), pages 207-264, May.
    5. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," Review of Finance, European Finance Association, vol. 33(5), pages 2223-2273.
    6. Kozak, Serhiy & Nagel, Stefan & Santosh, Shrihari, 2020. "Shrinking the cross-section," Journal of Financial Economics, Elsevier, vol. 135(2), pages 271-292.
    7. R. David Mclean & Jeffrey Pontiff, 2016. "Does Academic Research Destroy Stock Return Predictability?," Journal of Finance, American Finance Association, vol. 71(1), pages 5-32, February.
    8. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    9. Fama, Eugene F. & French, Kenneth R., 2015. "A five-factor asset pricing model," Journal of Financial Economics, Elsevier, vol. 116(1), pages 1-22.
    10. Martin Lettau & Markus Pelger & Stijn Van Nieuwerburgh, 2020. "Factors That Fit the Time Series and Cross-Section of Stock Returns," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2274-2325.
    11. Joachim Freyberger & Andreas Neuhierl & Michael Weber, 2020. "Dissecting Characteristics Nonparametrically," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2326-2377.
    12. Titman, Sheridan & Wei, K. C. John & Xie, Feixue, 2004. "Capital Investments and Stock Returns," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 39(4), pages 677-700, December.
    13. Joachim Freyberger & Andreas Neuhierl & Michael Weber & Andrew KarolyiEditor, 2020. "Dissecting Characteristics Nonparametrically," Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2326-2377.
    14. Jeremiah Green & John R. M. Hand & X. Frank Zhang, 2017. "The Characteristics that Provide Independent Information about Average U.S. Monthly Stock Returns," The Review of Financial Studies, Society for Financial Studies, vol. 30(12), pages 4389-4436.
    15. Martin Lettau & Markus Pelger, 2020. "Factors That Fit the Time Series and Cross-Section of Stock Returns," Review of Finance, European Finance Association, vol. 33(5), pages 2274-2325.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chen, Andrew Y. & McCoy, Jack, 2024. "Missing values handling for machine learning portfolios," Journal of Financial Economics, Elsevier, vol. 155(C).
    2. Clarke, Charles, 2022. "The level, slope, and curve factor model for stocks," Journal of Financial Economics, Elsevier, vol. 143(1), pages 159-187.
    3. Baba-Yara, Fahiz & Boons, Martijn & Tamoni, Andrea, 2024. "Persistent and transitory components of firm characteristics: Implications for asset pricing," Journal of Financial Economics, Elsevier, vol. 154(C).
    4. Doron Avramov & Si Cheng & Lior Metzker, 2023. "Machine Learning vs. Economic Restrictions: Evidence from Stock Return Predictability," Management Science, INFORMS, vol. 69(5), pages 2587-2619, May.
    5. Lioui, Abraham & Tarelli, Andrea, 2022. "Chasing the ESG factor," Journal of Banking & Finance, Elsevier, vol. 139(C).
    6. Rubesam, Alexandre, 2022. "Machine learning portfolios with equal risk contributions: Evidence from the Brazilian market," Emerging Markets Review, Elsevier, vol. 51(PB).
    7. Cakici, Nusret & Fieberg, Christian & Metko, Daniel & Zaremba, Adam, 2023. "Machine learning goes global: Cross-sectional return predictability in international stock markets," Journal of Economic Dynamics and Control, Elsevier, vol. 155(C).
    8. Bagnara, Matteo, 2024. "The economic value of cross-predictability: A performance-based measure," SAFE Working Paper Series 424, Leibniz Institute for Financial Research SAFE.
    9. van Binsbergen, Jules H. & Boons, Martijn & Opp, Christian C. & Tamoni, Andrea, 2023. "Dynamic asset (mis)pricing: Build-up versus resolution anomalies," Journal of Financial Economics, Elsevier, vol. 147(2), pages 406-431.
    10. Yan, Jingda & Yu, Jialin, 2023. "Cross-stock momentum and factor momentum," Journal of Financial Economics, Elsevier, vol. 150(2).
    11. Langlois, Hugues, 2023. "What matters in a characteristic?," Journal of Financial Economics, Elsevier, vol. 149(1), pages 52-72.
    12. Tobek, Ondrej & Hronec, Martin, 2021. "Does it pay to follow anomalies research? Machine learning approach with international evidence," Journal of Financial Markets, Elsevier, vol. 56(C).
    13. Ma, Tian & Leong, Wen Jun & Jiang, Fuwei, 2023. "A latent factor model for the Chinese stock market," International Review of Financial Analysis, Elsevier, vol. 87(C).
    14. De Nard, Gianluca & Zhao, Zhao, 2023. "Using, taming or avoiding the factor zoo? A double-shrinkage estimator for covariance matrices," Journal of Empirical Finance, Elsevier, vol. 72(C), pages 23-35.
    15. Söhnke M. Bartram & Harald Lohre & Peter F. Pope & Ananthalakshmi Ranganathan, 2021. "Navigating the factor zoo around the world: an institutional investor perspective," Journal of Business Economics, Springer, vol. 91(5), pages 655-703, July.
    16. Wolfgang Drobetz & Tizian Otto, 2021. "Empirical asset pricing via machine learning: evidence from the European stock market," Journal of Asset Management, Palgrave Macmillan, vol. 22(7), pages 507-538, December.
    17. Kaniel, Ron & Lin, Zihan & Pelger, Markus & Van Nieuwerburgh, Stijn, 2023. "Machine-learning the skill of mutual fund managers," Journal of Financial Economics, Elsevier, vol. 150(1), pages 94-138.
    18. Smith, Simon C., 2022. "Time-variation, multiple testing, and the factor zoo," International Review of Financial Analysis, Elsevier, vol. 84(C).
    19. Christian Fieberg & Daniel Metko & Thorsten Poddig & Thomas Loy, 2023. "Machine learning techniques for cross-sectional equity returns’ prediction," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(1), pages 289-323, March.
    20. Ni, Xuanming & Zheng, Tiantian & Zhao, Huimin & Zhu, Shushang, 2023. "High-dimensional portfolio optimization based on tree-structured factor model," Pacific-Basin Finance Journal, Elsevier, vol. 81(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2207.13071. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.