IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2411.09218.html
   My bibliography  Save this paper

On the (Mis)Use of Machine Learning with Panel Data

Author

Listed:
  • Augusto Cerqua
  • Marco Letta
  • Gabriele Pinto

Abstract

Machine Learning (ML) is increasingly employed to inform and support policymaking interventions. This methodological article cautions practitioners about common but often overlooked pitfalls associated with the uncritical application of supervised ML algorithms to panel data. Ignoring the cross-sectional and longitudinal structure of this data can lead to hard-to-detect data leakage, inflated out-of-sample performance, and an inadvertent overestimation of the real-world usefulness and applicability of ML models. After clarifying these issues, we provide practical guidelines and best practices for applied researchers to ensure the correct implementation of supervised ML in panel data environments, emphasizing the need to define ex ante the primary goal of the analysis and align the ML pipeline accordingly. An empirical application based on over 3,000 US counties from 2000 to 2019 illustrates the practical relevance of these points across nearly 500 models for both classification and regression tasks.

Suggested Citation

  • Augusto Cerqua & Marco Letta & Gabriele Pinto, 2024. "On the (Mis)Use of Machine Learning with Panel Data," Papers 2411.09218, arXiv.org.
  • Handle: RePEc:arx:papers:2411.09218
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2411.09218
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    2. Jon Kleinberg & Jens Ludwig & Sendhil Mullainathan & Ziad Obermeyer, 2015. "Prediction Policy Problems," American Economic Review, American Economic Association, vol. 105(5), pages 491-495, May.
    3. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    4. Matthew Rosenblatt & Link Tejavibulya & Rongtao Jiang & Stephanie Noble & Dustin Scheinost, 2024. "Data leakage inflates prediction performance in connectome-based machine learning models," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    5. Stephen Jarvis & Olivier Deschenes & Akshaya Jha, 2022. "The Private and External Costs of Germany’s Nuclear Phase-Out," Journal of the European Economic Association, European Economic Association, vol. 20(3), pages 1311-1346.
    6. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    7. Bluwstein, Kristina & Buckmann, Marcus & Joseph, Andreas & Kapadia, Sujit & Şimşek, Özgür, 2023. "Credit growth, the yield curve and financial crisis prediction: Evidence from a machine learning approach," Journal of International Economics, Elsevier, vol. 145(C).
    8. de Blasio, Guido & D'Ignazio, Alessio & Letta, Marco, 2022. "Gotham city. Predicting ‘corrupted’ municipalities with machine learning," Technological Forecasting and Social Change, Elsevier, vol. 184(C).
    9. Lisa Messeri & M. J. Crockett, 2024. "Artificial intelligence and illusions of understanding in scientific research," Nature, Nature, vol. 627(8002), pages 49-58, March.
    10. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    11. Matthew S. Johnson & David I. Levine & Michael W. Toffel, 2023. "Improving Regulatory Effectiveness through Better Targeting: Evidence from OSHA," American Economic Journal: Applied Economics, American Economic Association, vol. 15(4), pages 30-67, October.
    12. Antulov-Fantulin, Nino & Lagravinese, Raffaele & Resce, Giuliano, 2021. "Predicting bankruptcy of local government: A machine learning approach," Journal of Economic Behavior & Organization, Elsevier, vol. 183(C), pages 681-699.
    13. Dmitry Arkhangelsky & Guido Imbens, 2023. "Causal Models for Longitudinal and Panel Data: A Survey," Papers 2311.15458, arXiv.org, revised Jun 2024.
    14. Christensen, Peter & Francisco, Paul & Myers, Erica & Shao, Hansen & Souza, Mateus, 2024. "Energy efficiency can deliver for climate policy: Evidence from machine learning-based targeting," Journal of Public Economics, Elsevier, vol. 234(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bas Bosma & Arjen Witteloostuijn, 2024. "Machine learning in international business," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 55(6), pages 676-702, August.
    2. Falco J. Bargagli Stoffi & Kenneth De Beckker & Joana E. Maldonado & Kristof De Witte, 2021. "Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy," Papers 2102.04382, arXiv.org.
    3. Lundberg, Ian & Brand, Jennie E. & Jeon, Nanum, 2022. "Researcher reasoning meets computational capacity: Machine learning for social science," SocArXiv s5zc8, Center for Open Science.
    4. Nicolaj N. Mühlbach, 2020. "Tree-based Synthetic Control Methods: Consequences of moving the US Embassy," CREATES Research Papers 2020-04, Department of Economics and Business Economics, Aarhus University.
    5. Naguib, Costanza, 2019. "Estimating the Heterogeneous Impact of the Free Movement of Persons on Relative Wage Mobility," Economics Working Paper Series 1903, University of St. Gallen, School of Economics and Political Science.
    6. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    7. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    8. Delogu, Marco & Lagravinese, Raffaele & Paolini, Dimitri & Resce, Giuliano, 2024. "Predicting dropout from higher education: Evidence from Italy," Economic Modelling, Elsevier, vol. 130(C).
    9. Daniel Jacob, 2021. "CATE meets ML," Digital Finance, Springer, vol. 3(2), pages 99-148, June.
    10. Francesco Decarolis & Cristina Giorgiantonio, 2020. "Corruption red flags in public procurement: new evidence from Italian calls for tenders," Questioni di Economia e Finanza (Occasional Papers) 544, Bank of Italy, Economic Research and International Relations Area.
    11. Jonathan A. Cook & Saad Siddiqui, 2020. "Random forests and selected samples," Bulletin of Economic Research, Wiley Blackwell, vol. 72(3), pages 272-287, July.
    12. Max Vilgalys, 2023. "A Machine Learning Approach to Measuring Climate Adaptation," Papers 2302.01236, arXiv.org.
    13. Isabel Hovdahl, 2019. "On the use of machine learning for causal inference in climate economics," Working Papers No 05/2019, Centre for Applied Macro- and Petroleum economics (CAMP), BI Norwegian Business School.
    14. de Blasio, Guido & D'Ignazio, Alessio & Letta, Marco, 2022. "Gotham city. Predicting ‘corrupted’ municipalities with machine learning," Technological Forecasting and Social Change, Elsevier, vol. 184(C).
    15. Khashayar Khosravi & Greg Lewis & Vasilis Syrgkanis, 2019. "Non-Parametric Inference Adaptive to Intrinsic Dimension," Papers 1901.03719, arXiv.org, revised Jun 2019.
    16. Cerqua, Augusto & Letta, Marco, 2022. "Local inequalities of the COVID-19 crisis," Regional Science and Urban Economics, Elsevier, vol. 92(C).
    17. Athey, Susan & Imbens, Guido W. & Metzger, Jonas & Munro, Evan, 2024. "Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations," Journal of Econometrics, Elsevier, vol. 240(2).
    18. Monica Andini & Emanuele Ciani & Guido de Blasio & Alessio D'Ignazio & Viola Salvestrini, 2017. "Targeting policy-compliers with machine learning: an application to a tax rebate programme in Italy," Temi di discussione (Economic working papers) 1158, Bank of Italy, Economic Research and International Relations Area.
    19. Susan Athey, 2018. "The Impact of Machine Learning on Economics," NBER Chapters, in: The Economics of Artificial Intelligence: An Agenda, pages 507-547, National Bureau of Economic Research, Inc.
    20. Prothit Sen & Phanish Puranam, 2022. "Do Alliance portfolios encourage or impede new business practice adoption? Theory and evidence from the private equity industry," Strategic Management Journal, Wiley Blackwell, vol. 43(11), pages 2279-2312, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2411.09218. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.