IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2008.03600.html
   My bibliography  Save this paper

Machine Learning Panel Data Regressions with Heavy-tailed Dependent Data: Theory and Application

Author

Listed:
  • Andrii Babii
  • Ryan T. Ball
  • Eric Ghysels
  • Jonas Striaukas

Abstract

The paper introduces structured machine learning regressions for heavy-tailed dependent panel data potentially sampled at different frequencies. We focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and improve the quality of the estimates. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data can have fat tails. To that end, we leverage on a new Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed $\tau$-mixing processes.

Suggested Citation

  • Andrii Babii & Ryan T. Ball & Eric Ghysels & Jonas Striaukas, 2020. "Machine Learning Panel Data Regressions with Heavy-tailed Dependent Data: Theory and Application," Papers 2008.03600, arXiv.org, revised Nov 2021.
  • Handle: RePEc:arx:papers:2008.03600
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2008.03600
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Eric Ghysels & Arthur Sinko & Rossen Valkanov, 2007. "MIDAS Regressions: Further Results and New Directions," Econometric Reviews, Taylor & Francis Journals, vol. 26(1), pages 53-90.
    2. Khalaf, Lynda & Kichian, Maral & Saunders, Charles J. & Voia, Marcel, 2021. "Dynamic panels with MIDAS covariates: Nonlinearity, estimation and fit," Journal of Econometrics, Elsevier, vol. 220(2), pages 589-605.
    3. Liangjun Su & Zhentao Shi & Peter C. B. Phillips, 2016. "Identifying Latent Structures in Panel Data," Econometrica, Econometric Society, vol. 84, pages 2215-2264, November.
    4. Fernández-Val, Iván & Weidner, Martin, 2016. "Individual and time effects in nonlinear panel models with large N, T," Journal of Econometrics, Elsevier, vol. 192(1), pages 291-312.
    5. Lu, Xun & Su, Liangjun, 2016. "Shrinkage estimation of dynamic panel data models with interactive fixed effects," Journal of Econometrics, Elsevier, vol. 190(1), pages 148-175.
    6. Peter C. B. Phillips & Hyungsik R. Moon, 1999. "Linear Regression Limit Theory for Nonstationary Panel Data," Econometrica, Econometric Society, vol. 67(5), pages 1057-1112, September.
    7. Ghysels, Eric & Santa-Clara, Pedro & Valkanov, Rossen, 2006. "Predicting volatility: getting the most out of return data sampled at different frequencies," Journal of Econometrics, Elsevier, vol. 131(1-2), pages 59-95.
    8. J. Dedecker & C. Prieur, 2004. "Coupling for τ-Dependent Sequences and Applications," Journal of Theoretical Probability, Springer, vol. 17(4), pages 861-885, October.
    9. Alexandre Belloni & Victor Chernozhukov & Christian Hansen & Damian Kozbur, 2016. "Inference in High-Dimensional Panel Models With an Application to Gun Control," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 590-605, October.
    10. Andrii Babii & Jean-Pierre Florens, 2017. "Is completeness necessary? Estimation in nonidentified linear models," Papers 1709.03473, arXiv.org, revised Nov 2021.
    11. Andrii Babii & Eric Ghysels & Jonas Striaukas, 2022. "Machine Learning Time Series Regressions With an Application to Nowcasting," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(3), pages 1094-1106, June.
    12. Michael W. McCracken & Serena Ng, 2016. "FRED-MD: A Monthly Database for Macroeconomic Research," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 574-589, October.
    13. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    14. Jose M. Carabias, 2018. "The real-time information content of macroeconomic news: implications for firm-level earnings expectations," Review of Accounting Studies, Springer, vol. 23(1), pages 136-166, March.
    15. Victor Chernozhukov & Jerry Hausman & Whitney K. Newey, 2019. "Demand analysis with many prices," CeMMAP working papers CWP59/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    16. Javier Alvarez & Manuel Arellano, 2003. "The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators," Econometrica, Econometric Society, vol. 71(4), pages 1121-1159, July.
    17. C. Marsilli, 2014. "Variable Selection in Predictive MIDAS Models," Working papers 520, Banque de France.
    18. Elena Andreou & Eric Ghysels & Andros Kourtellos, 2013. "Should Macroeconomic Forecasters Use Daily Financial Data and How?," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 31(2), pages 240-251, April.
    19. Diebold, Francis X & Mariano, Roberto S, 2002. "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(1), pages 134-144, January.
    20. Kock, Anders Bredahl, 2013. "Oracle Efficient Variable Selection In Random And Fixed Effects Panel Data Models," Econometric Theory, Cambridge University Press, vol. 29(1), pages 115-152, February.
    21. Andrii Babii, 2022. "High-Dimensional Mixed-Frequency IV Regression," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(4), pages 1470-1483, October.
    22. Koenker, Roger, 2004. "Quantile regression for longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 91(1), pages 74-89, October.
    23. Carabias, Jose M., 2018. "The real-time information content of macroeconomic news: implications for firm-level earnings expectations," LSE Research Online Documents on Economics 86399, London School of Economics and Political Science, LSE Library.
    24. Chiang, Harold D. & Rodrigue, Joel & Sasaki, Yuya, 2023. "Post-Selection Inference In Three-Dimensional Panel Data," Econometric Theory, Cambridge University Press, vol. 39(3), pages 623-658, June.
    25. Hansen, Christian B., 2007. "Asymptotic properties of a robust variance matrix estimator for panel data when T is large," Journal of Econometrics, Elsevier, vol. 141(2), pages 597-620, December.
    26. Claudia Foroni & Massimiliano Marcellino & Christian Schumacher, 2015. "Unrestricted mixed data sampling (MIDAS): MIDAS regressions with unrestricted lag polynomials," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 178(1), pages 57-82, January.
    27. Ryan T. Ball & Peter Easton, 2013. "Dissecting Earnings Recognition Timeliness," Journal of Accounting Research, Wiley Blackwell, vol. 51(5), pages 1099-1132, December.
    28. Jinyong Hahn & Guido Kuersteiner, 2002. "Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects when Both "n" and "T" Are Large," Econometrica, Econometric Society, vol. 70(4), pages 1639-1657, July.
    29. Kock, Anders Bredahl, 2016. "Oracle inequalities, variable selection and uniform inference in high-dimensional correlated random effects panel data models," Journal of Econometrics, Elsevier, vol. 195(1), pages 71-85.
    30. Carrasco, Marine & Florens, Jean-Pierre & Renault, Eric, 2007. "Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 77, Elsevier.
    31. Belloni, Alexandre & Chen, Mingli & Madrid Padilla, Oscar Hernan & Wang, Zixuan (Kevin), 2019. "High Dimensional Latent Panel Quantile Regression with an Application to Asset Pricing," The Warwick Economics Research Paper Series (TWERPS) 1230, University of Warwick, Department of Economics.
    32. Harding, Matthew & Lamarche, Carlos, 2019. "A panel quantile approach to attrition bias in Big Data: Evidence from a randomized experiment," Journal of Econometrics, Elsevier, vol. 211(1), pages 61-82.
    33. Dedecker, Jérôme & Doukhan, Paul, 2003. "A new covariance inequality and applications," Stochastic Processes and their Applications, Elsevier, vol. 106(1), pages 63-80, July.
    34. Lamarche, Carlos, 2010. "Robust penalized quantile regression estimation for panel data," Journal of Econometrics, Elsevier, vol. 157(2), pages 396-408, August.
    35. Ryan T. Ball & Eric Ghysels, 2018. "Automated Earnings Forecasts: Beat Analysts or Combine and Conquer?," Management Science, INFORMS, vol. 64(10), pages 4936-4952, October.
    36. Ghysels, Eric & Qian, Hang, 2019. "Estimating MIDAS regressions via OLS with polynomial parameter profiling," Econometrics and Statistics, Elsevier, vol. 9(C), pages 1-16.
    37. Andrii Babii & Eric Ghysels & Jonas Striaukas, 2019. "High-Dimensional Granger Causality Tests with an Application to VIX and News," Papers 1912.06307, arXiv.org, revised Feb 2021.
    38. Arellano, Manuel, 2003. "Panel Data Econometrics," OUP Catalogue, Oxford University Press, number 9780199245291.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    2. Knut Are Aastveit & Tuva Marie Fastbø & Eleonora Granziera & Kenneth Sæterhagen Paulsen & Kjersti Næss Torstensen, 2020. "Nowcasting Norwegian household consumption with debit card transaction data," Working Paper 2020/17, Norges Bank.
    3. Hafner, Christian & Wang, Linqi, 2020. "Dynamic portfolio selection with sector-specific regularization," LIDAM Discussion Papers ISBA 2020032, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    4. Hans Genberg & Özer Karagedikli, 2021. "Machine Learning and Central Banks: Ready for Prime Time?," Working Papers wp43, South East Asian Central Banks (SEACEN) Research and Training Centre.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Andrii Babii & Eric Ghysels & Jonas Striaukas, 2022. "Machine Learning Time Series Regressions With an Application to Nowcasting," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(3), pages 1094-1106, June.
    2. Andrii Babii & Eric Ghysels & Jonas Striaukas, 2019. "High-Dimensional Granger Causality Tests with an Application to VIX and News," Papers 1912.06307, arXiv.org, revised Feb 2021.
    3. Mogliani, Matteo & Simoni, Anna, 2021. "Bayesian MIDAS penalized regressions: Estimation, selection, and prediction," Journal of Econometrics, Elsevier, vol. 222(1), pages 833-860.
    4. Lamarche, Carlos & Parker, Thomas, 2023. "Wild bootstrap inference for penalized quantile regression for longitudinal data," Journal of Econometrics, Elsevier, vol. 235(2), pages 1799-1826.
    5. Andrii Babii & Ryan T. Ball & Eric Ghysels & Jonas Striaukas, 2024. "Panel data nowcasting: The case of price–earnings ratios," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(2), pages 292-307, March.
    6. Andrii Babii & Ryan T. Ball & Eric Ghysels & Jonas Striaukas, 2023. "Panel Data Nowcasting: The Case of Price-Earnings Ratios," Papers 2307.02673, arXiv.org.
    7. Sarun Kamolthip, 2021. "Macroeconomic Forecasting with LSTM and Mixed Frequency Time Series Data," PIER Discussion Papers 165, Puey Ungphakorn Institute for Economic Research.
    8. Andrii Babii, 2022. "High-Dimensional Mixed-Frequency IV Regression," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(4), pages 1470-1483, October.
    9. Matteo Mogliani & Anna Simoni, 2024. "Bayesian Bi-level Sparse Group Regressions for Macroeconomic Forecasting," Papers 2404.02671, arXiv.org.
    10. Richard Schnorrenberger & Aishameriane Schmidt & Guilherme Valle Moura, 2024. "Harnessing Machine Learning for Real-Time Inflation Nowcasting," Working Papers 806, DNB.
    11. Degiannakis, Stavros & Filis, George, 2018. "Forecasting oil prices: High-frequency financial data are indeed useful," Energy Economics, Elsevier, vol. 76(C), pages 388-402.
    12. Weidner, Martin & Zylkin, Thomas, 2021. "Bias and consistency in three-way gravity models," Journal of International Economics, Elsevier, vol. 132(C).
    13. Rong Fu & Luze Xie & Tao Liu & Juan Huang & Binbin Zheng, 2022. "Chinese Economic Growth Projections Based on Mixed Data of Carbon Emissions under the COVID-19 Pandemic," Sustainability, MDPI, vol. 14(24), pages 1-16, December.
    14. Ana Beatriz Galvão & Michael Owyang, 2022. "Forecasting low‐frequency macroeconomic events with high‐frequency data," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(7), pages 1314-1333, November.
    15. Juodis, Artūras & Sarafidis, Vasilis, 2022. "An incidental parameters free inference approach for panels with common shocks," Journal of Econometrics, Elsevier, vol. 229(1), pages 19-54.
    16. Knotek, Edward S. & Zaman, Saeed, 2019. "Financial nowcasts and their usefulness in macroeconomic forecasting," International Journal of Forecasting, Elsevier, vol. 35(4), pages 1708-1724.
    17. Andrii Babii & Jean-Pierre Florens, 2017. "Is completeness necessary? Estimation in nonidentified linear models," Papers 1709.03473, arXiv.org, revised Nov 2021.
    18. Mayer, Alexander, 2022. "On the local power of some tests of strict exogeneity in linear fixed effects models," Econometrics and Statistics, Elsevier, vol. 24(C), pages 49-74.
    19. Giovanni Ballarin & Petros Dellaportas & Lyudmila Grigoryeva & Marcel Hirt & Sophie van Huellen & Juan-Pablo Ortega, 2022. "Reservoir Computing for Macroeconomic Forecasting with Mixed Frequency Data," Papers 2211.00363, arXiv.org, revised Jan 2024.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2008.03600. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.