IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2303.02820.html
   My bibliography  Save this paper

EnsembleIV: Creating Instrumental Variables from Ensemble Learners for Robust Statistical Inference

Author

Listed:
  • Gordon Burtch
  • Edward McFowland III
  • Mochen Yang
  • Gediminas Adomavicius

Abstract

Despite increasing popularity in empirical studies, the integration of machine learning generated variables into regression models for statistical inference suffers from the measurement error problem, which can bias estimation and threaten the validity of inferences. In this paper, we develop a novel approach to alleviate associated estimation biases. Our proposed approach, EnsembleIV, creates valid and strong instrumental variables from weak learners in an ensemble model, and uses them to obtain consistent estimates that are robust against the measurement error problem. Our empirical evaluations, using both synthetic and real-world datasets, show that EnsembleIV can effectively reduce estimation biases across several common regression specifications, and can be combined with modern deep learning techniques when dealing with unstructured data.

Suggested Citation

  • Gordon Burtch & Edward McFowland III & Mochen Yang & Gediminas Adomavicius, 2023. "EnsembleIV: Creating Instrumental Variables from Ensemble Learners for Robust Statistical Inference," Papers 2303.02820, arXiv.org.
  • Handle: RePEc:arx:papers:2303.02820
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2303.02820
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Aviv Nevo & Adam M. Rosen, 2012. "Identification With Imperfect Instruments," The Review of Economics and Statistics, MIT Press, vol. 94(3), pages 659-671, August.
    2. Yingyao Hu & Susanne M. Schennach, 2008. "Instrumental Variable Treatment of Nonclassical Measurement Error Models," Econometrica, Econometric Society, vol. 76(1), pages 195-216, January.
    3. Terza, Joseph V. & Basu, Anirban & Rathouz, Paul J., 2008. "Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling," Journal of Health Economics, Elsevier, vol. 27(3), pages 531-543, May.
    4. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    5. David Roodman, 2009. "A Note on the Theme of Too Many Instruments," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 71(1), pages 135-158, February.
    6. Mehrhoff, Jens, 2009. "A solution to the problem of too many instruments in dynamic panel data GMM," Discussion Paper Series 1: Economic Studies 2009,31, Deutsche Bundesbank.
    7. Mengke Qiao & Ke-Wei Huang, 2021. "Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining," Information Systems Research, INFORMS, vol. 32(2), pages 462-480, June.
    8. Mochen Yang & Yuqing Ren & Gediminas Adomavicius, 2019. "Understanding User-Generated Content and Customer Engagement on Facebook Business Pages," Information Systems Research, INFORMS, vol. 30(3), pages 839-855, September.
    9. Helmut Küchenhoff & Samuel M. Mwalili & Emmanuel Lesaffre, 2006. "A General Method for Dealing with Misclassification in Regression: The Misclassification SIMEX," Biometrics, The International Biometric Society, vol. 62(1), pages 85-96, March.
    10. Michael P. Murray, 2006. "Avoiding Invalid Instruments and Coping with Weak Instruments," Journal of Economic Perspectives, American Economic Association, vol. 20(4), pages 111-132, Fall.
    11. Doruk Cengiz & Arindrajit Dube & Attila Lindner & David Zentler-Munro, 2022. "Seeing beyond the Trees: Using Machine Learning to Estimate the Impact of Minimum Wages on Labor Market Outcomes," Journal of Labor Economics, University of Chicago Press, vol. 40(S1), pages 203-247.
    12. Oxley, Les & McAleer, Michael, 1993. "Econometric Issues in Macroeconomic Models with Generated Regressors," Journal of Economic Surveys, Wiley Blackwell, vol. 7(1), pages 1-40.
    13. Pagan, Adrian, 1984. "Econometric Issues in the Analysis of Regressions with Generated Regressors," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 25(1), pages 221-247, February.
    14. Mehrhoff, Jens, 2009. "A solution to the problem of too many instruments in dynamic panel data GMM," IBES Diskussionsbeiträge 171, University of Duisburg-Essen, Institute of Business and Economic Studie (IBES).
    15. Dokyun Lee & Kartik Hosanagar & Harikesh S. Nair, 2018. "Advertising Content and Consumer Engagement on Social Media: Evidence from Facebook," Management Science, INFORMS, vol. 64(11), pages 5105-5131, November.
    16. James H. Stock & Motohiro Yogo, 2002. "Testing for Weak Instruments in Linear IV Regression," NBER Technical Working Papers 0284, National Bureau of Economic Research, Inc.
    17. Khim-Yong Goh & Cheng-Suang Heng & Zhijie Lin, 2013. "Social Media Brand Community and Consumer Behavior: Quantifying the Relative Impact of User- and Marketer-Generated Content," Information Systems Research, INFORMS, vol. 24(1), pages 88-107, March.
    18. Antonio Moreno & Christian Terwiesch, 2014. "Doing Business with Strangers: Reputation in Online Service Marketplaces," Information Systems Research, INFORMS, vol. 25(4), pages 865-886, December.
    19. Seshadri Tirunillai & Gerard J. Tellis, 2012. "Does Chatter Really Matter? Dynamics of User-Generated Content and Stock Performance," Marketing Science, INFORMS, vol. 31(2), pages 198-215, March.
    20. Fong, Christian & Tyler, Matthew, 2021. "Machine Learning Predictions as Regression Covariates," Political Analysis, Cambridge University Press, vol. 29(4), pages 467-484, October.
    21. Isaiah Andrews & James H. Stock & Liyang Sun, 2019. "Weak Instruments in Instrumental Variables Regression: Theory and Practice," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 727-753, August.
    22. Mochen Yang & Gediminas Adomavicius & Gordon Burtch & Yuqing Rena, 2018. "Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining," Information Systems Research, INFORMS, vol. 29(1), pages 4-24, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mochen Yang & Edward McFowland & Gordon Burtch & Gediminas Adomavicius, 2022. "Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem," INFORMS Joural on Data Science, INFORMS, vol. 1(2), pages 138-155, October.
    2. Zhaonan Qu & Yongchan Kwon, 2024. "Distributionally Robust Instrumental Variables Estimation," Papers 2410.15634, arXiv.org.
    3. Hyelim Oh & Khim-Yong Goh & Tuan Q. Phan, 2023. "Are You What You Tweet? The Impact of Sentiment on Digital News Consumption and Social Media Sharing," Information Systems Research, INFORMS, vol. 34(1), pages 111-136, March.
    4. Mochen Yang & Gediminas Adomavicius & Gordon Burtch & Yuqing Rena, 2018. "Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining," Information Systems Research, INFORMS, vol. 29(1), pages 4-24, March.
    5. Sunghun Chung & Animesh Animesh & Kunsoo Han & Alain Pinsonneault, 2020. "Financial Returns to Firms’ Communication Actions on Firm-Initiated Social Media: Evidence from Facebook Business Pages," Information Systems Research, INFORMS, vol. 31(1), pages 258-285, March.
    6. Simplice A. Asongu & Nicholas M. Odhiambo, 2019. "Governance, capital flight and industrialisation in Africa," Journal of Economic Structures, Springer;Pan-Pacific Association of Input-Output Studies (PAPAIOS), vol. 8(1), pages 1-22, December.
    7. Mohl, Philipp & Hagen, Tobias, 2011. "Do EU structural funds promote regional employment? Evidence from dynamic panel data models," Working Paper Series 1403, European Central Bank.
    8. Jeffrey Kouton, 2021. "The impact of renewable energy consumption on inclusive growth: panel data analysis in 44 African countries," Economic Change and Restructuring, Springer, vol. 54(1), pages 145-170, February.
    9. Emna Trabelsi, 2022. "Macroprudential Transparency and Price Stability in Emerging and Developing Countries," Journal of Central Banking Theory and Practice, Central bank of Montenegro, vol. 11(1), pages 105-129.
    10. Simplice A. Asongu & Joseph Nnanna, 2020. "Governance and the Capital Flight Trap in Africa," Working Papers of the African Governance and Development Institute. 20/024, African Governance and Development Institute..
    11. Joseph L. Dieleman & Michael Hanlon, 2014. "Measuring The Displacement And Replacement Of Government Health Expenditure," Health Economics, John Wiley & Sons, Ltd., vol. 23(2), pages 129-140, February.
    12. Efobi, Uchenna & Asongu, Simplice & Okafor, Chinelo & Tchamyou, Vanessa & Tanankem, Belmondo, 2019. "Remittances, finance and industrialisation in Africa," Journal of Multinational Financial Management, Elsevier, vol. 49(C), pages 54-66.
    13. Gordon Burtch & Anindya Ghose & Sunil Wattal, 2013. "An Empirical Examination of the Antecedents and Consequences of Contribution Patterns in Crowd-Funded Markets," Information Systems Research, INFORMS, vol. 24(3), pages 499-519, September.
    14. Majid M. Al-Sadoon & Tong Li & M. Hashem Pesaran, 2017. "Exponential class of dynamic binary choice panel data models with fixed effects," Econometric Reviews, Taylor & Francis Journals, vol. 36(6-9), pages 898-927, October.
    15. Varvara Isyuk, 2014. "Resuming bank lending in the aftermath of the Capital Purchase Program," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-01093414, HAL.
    16. Sebastian Kripfganz & Claudia Schwarz, 2019. "Estimation of linear dynamic panel data models with time‐invariant regressors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(4), pages 526-546, June.
    17. Wenxi Lu, 2018. "FDI, Service imports and Export development," School of Economics and Public Policy Working Papers 2018-05, University of Adelaide, School of Economics and Public Policy.
    18. Mengke Qiao & Ke-Wei Huang, 2021. "Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining," Information Systems Research, INFORMS, vol. 32(2), pages 462-480, June.
    19. M. E. Bontempi & I. Mammi, 2014. "pca2: implementing a strategy to reduce the instrument count in panel GMM," Working Papers wp960, Dipartimento Scienze Economiche, Universita' di Bologna.
    20. Mallick, Debdulal, 2012. "The role of the elasticity of substitution in economic growth: A cross-country investigation," Labour Economics, Elsevier, vol. 19(5), pages 682-694.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2303.02820. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.