How Using Machine Learning Classification as a Variable in Regression Leads to Attenuation Bias and What to Do About It
Author
Abstract
Suggested Citation
DOI: 10.31219/osf.io/453jk
Download full text from publisher
References listed on IDEAS
- Takaya Saito & Marc Rehmsmeier, 2015. "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-21, March.
- Aigner, Dennis J., 1973. "Regression with a binary independent variable subject to errors of observation," Journal of Econometrics, Elsevier, vol. 1(1), pages 49-59, March.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018.
"Double/debiased machine learning for treatment and structural parameters,"
Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2017. "Double/Debiased Machine Learning for Treatment and Structural Parameters," NBER Working Papers 23564, National Bureau of Economic Research, Inc.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey & James Robins, 2017. "Double/debiased machine learning for treatment and structural parameters," CeMMAP working papers CWP28/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey & James Robins, 2017. "Double/debiased machine learning for treatment and structural parameters," CeMMAP working papers 28/17, Institute for Fiscal Studies.
- Grimmer, Justin & Stewart, Brandon M., 2013. "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, Cambridge University Press, vol. 21(3), pages 267-297, July.
- repec:fth:prinin:419 is not listed on IDEAS
- DiTraglia, Francis J. & García-Jimeno, Camilo, 2019. "Identifying the effect of a mis-classified, binary, endogenous regressor," Journal of Econometrics, Elsevier, vol. 209(2), pages 376-390.
- Anita R. Gohdes, 2020. "Repression Technology: Internet Accessibility and State Violence," American Journal of Political Science, John Wiley & Sons, vol. 64(3), pages 488-503, July.
- Robin Burgess & Matthew Hansen & Benjamin A. Olken & Peter Potapov & Stefanie Sieber, 2012.
"The Political Economy of Deforestation in the Tropics,"
The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 127(4), pages 1707-1754.
- Robin Burgess & Matthew Hansen & Benjamin A. Olken & Peter Potapov & Stefanie Sieber, 2011. "The Political Economy of Deforestation in the Tropics," NBER Working Papers 17417, National Bureau of Economic Research, Inc.
- Robin Burgess & Matthew Hansen & Benjamin Olken & Peter Potapov & Stefanie Sieber, 2012. "The Political Economy of Deforestation in the Tropics," Working Papers id:4963, eSocialSciences.
- Robin Burgess & Matthew Hansen & Benjamin Olken & Peter Potapov & Stefanie Sieber, 2012. "The Political Economy of Deforestation in the Tropics," STICERD - Economic Organisation and Public Policy Discussion Papers Series 037, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
- Olken, Benjamin & Sieber, Stefanie & Hansen, Matthew & Potapov, Peter, 2012. "The Political Economy of Deforestation in the Tropics," CEPR Discussion Papers 9020, C.E.P.R. Discussion Papers.
- Robin Burgess & Matthew Hansen & Benjamin Olken & Peter Potapov & Stefanie Sieber, 2012. "The Political Economy of Deforestation in the Tropics," GRI Working Papers 79, Grantham Research Institute on Climate Change and the Environment.
- Susanne M. Schennach, 2016. "Recent Advances in the Measurement Error Literature," Annual Review of Economics, Annual Reviews, vol. 8(1), pages 341-377, October.
- Barberá, Pablo & Casas, Andreu & Nagler, Jonathan & Egan, Patrick J. & Bonneau, Richard & Jost, John T. & Tucker, Joshua A., 2019. "Who Leads? Who Follows? Measuring Issue Attention and Agenda Setting by Legislators and the Mass Public Using Social Media Data," American Political Science Review, Cambridge University Press, vol. 113(4), pages 883-901, November.
- Bound, John & Brown, Charles & Mathiowetz, Nancy, 2001. "Measurement error in survey data," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 5, chapter 59, pages 3705-3843, Elsevier.
- Frazis, Harley & Loewenstein, Mark A., 2003. "Estimating linear regressions with mismeasured, possibly endogenous, binary explanatory variables," Journal of Econometrics, Elsevier, vol. 117(1), pages 151-178, November.
- Meyer, Bruce D. & Mittag, Nikolas, 2017. "Misclassification in binary choice models," Journal of Econometrics, Elsevier, vol. 200(2), pages 295-311.
- Thomas J. Kane & Cecilia Elena Rouse & Douglas Staiger, 1999.
"Estimating Returns to Schooling When Schooling is Misreported,"
NBER Working Papers
7235, National Bureau of Economic Research, Inc.
- Thomas J. Kane & Cecilia E. Rouse & Douglas Staiger, 1999. "Estimating Returns to Schooling When Schooling is Misreported," Working Papers 798, Princeton University, Department of Economics, Industrial Relations Section..
- Carlos Daniel Paulino & Paulo Soares & John Neuhaus, 2003. "Binomial Regression with Misclassification," Biometrics, The International Biometric Society, vol. 59(3), pages 670-675, September.
- Stefan Wager & Susan Athey, 2018.
"Estimation and Inference of Heterogeneous Treatment Effects using Random Forests,"
Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
- Wager, Stefan & Athey, Susan, 2017. "Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests," Research Papers 3576, Stanford University, Graduate School of Business.
- Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
- A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012.
"Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain,"
Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
- Alexandre Belloni & D. Chen & Victor Chernozhukov & Christian Hansen, 2010. "Sparse models and methods for optimal instruments with an application to eminent domain," CeMMAP working papers CWP31/10, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Alexandre Belloni & Daniel Chen & Victor Chernozhukov & Christian Hansen, 2010. "Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain," Papers 1010.4345, arXiv.org, revised Apr 2015.
- Oriana Bandiera & Andrea Prat & Stephen Hansen & Raffaella Sadun, 2020.
"CEO Behavior and Firm Performance,"
Journal of Political Economy, University of Chicago Press, vol. 128(4), pages 1325-1369.
- Oriana Bandiera & Stephen Hansen & Andrea Prat & Raffaella Sadun, 2017. "CEO Behavior and Firm Performance," NBER Working Papers 23248, National Bureau of Economic Research, Inc.
- Prat, Andrea & Hansen, Stephen & Sadun, Raffaella & Bandiera, Oriana, 2017. "CEO Behavior and Firm Performance," CEPR Discussion Papers 11960, C.E.P.R. Discussion Papers.
- Bandiera, Oriana & Prat, Andrea & Hansen, Stephen & Sadun, Raffaella, 2020. "CEO behavior and firm performance," LSE Research Online Documents on Economics 101423, London School of Economics and Political Science, LSE Library.
- Bollinger, Christopher R., 1996. "Bounding mean regressions when a binary regressor is mismeasured," Journal of Econometrics, Elsevier, vol. 73(2), pages 387-399, August.
- Bruce Meyer & Nikolas Mittag, 2013.
"Misclassification In Binary Choice Models,"
Working Papers
13-27, Center for Economic Studies, U.S. Census Bureau.
- Bruce Meyer & Nikolas Mittag, 2014. "Misclassification in Binary Choice Models," NBER Working Papers 20509, National Bureau of Economic Research, Inc.
- Jon Kleinberg & Himabindu Lakkaraju & Jure Leskovec & Jens Ludwig & Sendhil Mullainathan, 2018.
"Human Decisions and Machine Predictions,"
The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(1), pages 237-293.
- Jon Kleinberg & Himabindu Lakkaraju & Jure Leskovec & Jens Ludwig & Sendhil Mullainathan, 2017. "Human Decisions and Machine Predictions," NBER Working Papers 23180, National Bureau of Economic Research, Inc.
- Hausman, J. A. & Abrevaya, Jason & Scott-Morton, F. M., 1998.
"Misclassification of the dependent variable in a discrete-response setting,"
Journal of Econometrics, Elsevier, vol. 87(2), pages 239-269, September.
- Hausman, J.A. & Morton, F.M.S., 1994. "Misclassification of Dependent Variable in a Discrete Response Setting," Working papers 94-19, Massachusetts Institute of Technology (MIT), Department of Economics.
- Lowande, Kenneth, 2018. "Who Polices the Administrative State?," American Political Science Review, Cambridge University Press, vol. 112(4), pages 874-890, November.
- Katagiri, Azusa & Min, Eric, 2019. "The Credibility of Public and Private Signals: A Document-Based Approach," American Political Science Review, Cambridge University Press, vol. 113(1), pages 156-172, February.
- King, Gary & Zeng, Langche, 2001. "Explaining Rare Events in International Relations," International Organization, Cambridge University Press, vol. 55(3), pages 693-715, July.
- repec:cup:apsrev:v:113:y:2019:i:04:p:883-901_00 is not listed on IDEAS
- Daniel J. Hopkins & Gary King, 2010. "A Method of Automated Nonparametric Content Analysis for Social Science," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 229-247, January.
- AIGNER, Dennis J., 1973. "Regression with a binary independent variable subject to errors of observation," LIDAM Reprints CORE 130, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
- Goldberg, Amir & Srivastava, Sameer B & Manian, Govind & Monroe, William & Potts, Christopher, 2016. "Fitting In or Standing Out? The Tradeoffs of Structural and Cultural Embeddedness," Institute for Research on Labor and Employment, Working Paper Series qt9bf631rg, Institute of Industrial Relations, UC Berkeley.
- Athey, Susan & Imbens, Guido W., 2019.
"Machine Learning Methods Economists Should Know About,"
Research Papers
3776, Stanford University, Graduate School of Business.
- Susan Athey & Guido Imbens, 2019. "Machine Learning Methods Economists Should Know About," Papers 1903.10075, arXiv.org.
- Thomas J. Kane & Cecilia Rouse & Douglas Staiger, 1999. "Estimating Returns to Schooling When Schooling is Misreported," Working Papers 798, Princeton University, Department of Economics, Industrial Relations Section..
- Margaret E. Roberts & Brandon M. Stewart & Dustin Tingley & Christopher Lucas & Jetson Leder‐Luis & Shana Kushner Gadarian & Bethany Albertson & David G. Rand, 2014. "Structural Topic Models for Open‐Ended Survey Responses," American Journal of Political Science, John Wiley & Sons, vol. 58(4), pages 1064-1082, October.
- Mitts, Tamar, 2019. "From Isolation to Radicalization: Anti-Muslim Hostility and Support for ISIS in the West," American Political Science Review, Cambridge University Press, vol. 113(1), pages 173-194, February.
- Kosuke Imai & Teppei Yamamoto, 2010. "Causal Inference with Differential Measurement Error: Nonparametric Identification and Sensitivity Analysis," American Journal of Political Science, John Wiley & Sons, vol. 54(2), pages 543-560, April.
- Pan, Jennifer & Chen, Kaiping, 2018. "Concealing Corruption: How Chinese Officials Distort Upward Reporting of Online Grievances," American Political Science Review, Cambridge University Press, vol. 112(3), pages 602-620, August.
- Kevin M. Quinn & Burt L. Monroe & Michael Colaresi & Michael H. Crespin & Dragomir R. Radev, 2010. "How to Analyze Political Attention with Minimal Assumptions and Costs," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 209-228, January.
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.- Wossen, Tesfamicheal & Abay, Kibrom A. & Abdoulaye, Tahirou, 2022. "Misperceiving and misreporting input quality: Implications for input use and productivity," Journal of Development Economics, Elsevier, vol. 157(C).
- Takahide Yanagi, 2019.
"Inference on local average treatment effects for misclassified treatment,"
Econometric Reviews, Taylor & Francis Journals, vol. 38(8), pages 938-960, September.
- YANAGI, Takahide & 柳, 貴英, 2017. "Inference on Local Average Treatment Effects for Misclassified Treatment," Discussion Papers 2017-02, Graduate School of Economics, Hitotsubashi University.
- Takahide Yanagi, 2018. "Inference on Local Average Treatment Effects for Misclassified Treatment," Papers 1804.03349, arXiv.org.
- Tommasi, Denni & Zhang, Lina, 2024.
"Bounding program benefits when participation is misreported,"
Journal of Econometrics, Elsevier, vol. 238(1).
- Tommasi, Denni & Zhang, Lina, 2020. "Bounding Program Benefits When Participation Is Misreported," IZA Discussion Papers 13430, Institute of Labor Economics (IZA).
- Denni Tommasi & Lina Zhang, 2020. "Bounding Program Benefits When Participation is Misreported," Monash Econometrics and Business Statistics Working Papers 24/20, Monash University, Department of Econometrics and Business Statistics.
- Akanksha Negi & Digvijay Singh Negi, 2022. "Difference-in-Differences with a Misclassified Treatment," Papers 2208.02412, arXiv.org.
- Brachet, Tanguy, 2008. "Maternal Smoking, Misclassification, and Infant Health," MPRA Paper 21466, University Library of Munich, Germany.
- Steven J. Haider & Melvin Stephens Jr., 2020.
"Correcting for Misclassified Binary Regressors Using Instrumental Variables,"
NBER Working Papers
27797, National Bureau of Economic Research, Inc.
- Haider, Steven J. & Stephens Jr., Melvin, 2020. "Correcting for Misclassied Binary Regressors Using Instrumental Variables," IZA Discussion Papers 13593, Institute of Labor Economics (IZA).
- Adele Bergin, 2015. "Employer Changes and Wage Changes: Estimation with Measurement Error in a Binary Variable," LABOUR, CEIS, vol. 29(2), pages 194-223, June.
- Christian vom Lehn & Cache Ellsworth & Zachary Kroff, 2022.
"Reconciling Occupational Mobility in the Current Population Survey,"
Journal of Labor Economics, University of Chicago Press, vol. 40(4), pages 1005-1051.
- vom Lehn, Christian & Ellsworth, Cache & Kroff, Zachary, 2020. "Reconciling Occupational Mobility in the Current Population Survey," IZA Discussion Papers 13509, Institute of Labor Economics (IZA).
- Adele Bergin, 2013. "Job Changes and Wage Changes: Estimation with Measurement Error in a Binary Variable," Economics Department Working Paper Series n240-13.pdf, Department of Economics, National University of Ireland - Maynooth.
- Molinari, Francesca, 2008.
"Partial identification of probability distributions with misclassified data,"
Journal of Econometrics, Elsevier, vol. 144(1), pages 81-117, May.
- Molinari, Francesca, 2005. "Partial Identification of Probability Distributions with Misclassified Data," Working Papers 05-10, Cornell University, Center for Analytic Economics.
- Francis DiTraglia & Camilo Garcia-Jimeno, 2015. "On Mis-measured Binary Regressors: New Results And Some Comments on the Literature, Third Version," PIER Working Paper Archive 15-040, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 24 Nov 2015.
- Nguimkeu, Pierre & Denteh, Augustine & Tchernis, Rusty, 2019.
"On the estimation of treatment effects with endogenous misreporting,"
Journal of Econometrics, Elsevier, vol. 208(2), pages 487-506.
- Pierre Nguimkeu & Augustine Denteh & Rusty Tchernis, 2017. "On the Estimation of Treatment Effects with Endogenous Misreporting," NBER Working Papers 24117, National Bureau of Economic Research, Inc.
- Nguimkeu, Pierre & Denteh, Augustine & Tchernis, Rusty, 2018. "On the Estimation of Treatment Effects with Endogenous Misreporting," IZA Discussion Papers 11426, Institute of Labor Economics (IZA).
- Pierre Nguimkeu & Augustine Denteh & Rusty Tchernis, 2018. "On the Estimation of Treatment Effects with Endogenous Misreporting," Working Papers 2018-019, Human Capital and Economic Opportunity Working Group.
- Lundberg, Ian & Brand, Jennie E. & Jeon, Nanum, 2022. "Researcher reasoning meets computational capacity: Machine learning for social science," SocArXiv s5zc8, Center for Open Science.
- Francis J. DiTraglia & Camilo Garcia-Jimeno, 2020. "Identifying the effect of a mis-classified, binary, endogenous regressor," Papers 2011.07272, arXiv.org.
- DiTraglia, Francis J. & García-Jimeno, Camilo, 2019. "Identifying the effect of a mis-classified, binary, endogenous regressor," Journal of Econometrics, Elsevier, vol. 209(2), pages 376-390.
- Francis DiTraglia & Camilo Garcia-Jimeno, 2015. "On Mis-measured Binary Regressors: New Results And Some Comments on the Literature, Second Version," PIER Working Paper Archive 15-039, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 11 Nov 2015.
- Orville Mondal & Rui Wang, 2024. "Partial Identification of Binary Choice Models with Misreported Outcomes," Papers 2401.17137, arXiv.org.
- Frazis, Harley & Loewenstein, Mark A., 2003. "Estimating linear regressions with mismeasured, possibly endogenous, binary explanatory variables," Journal of Econometrics, Elsevier, vol. 117(1), pages 151-178, November.
- Arthur Lewbel, 2007.
"Estimation of Average Treatment Effects with Misclassification,"
Econometrica, Econometric Society, vol. 75(2), pages 537-551, March.
- Arthur Lewbel, 2003. "Estimation of Average Treatment Effects With Misclassification," Boston College Working Papers in Economics 556, Boston College Department of Economics, revised 04 Sep 2006.
- Arthur Lewbel, 2004. "Estimation of Average Treatment Effects With Misclassification," Econometric Society 2004 North American Winter Meetings 210, Econometric Society.
- Aprajit Mahajan, 2006. "Identification and Estimation of Regression Models with Misclassification," Econometrica, Econometric Society, vol. 74(3), pages 631-665, May.
More about this item
NEP fields
This paper has been announced in the following NEP Reports:- NEP-BIG-2021-06-21 (Big Data)
- NEP-CMP-2021-06-21 (Computational Economics)
- NEP-ECM-2021-06-21 (Econometrics)
Statistics
Access and download statisticsCorrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:453jk. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .
Please note that corrections may take a couple of weeks to filter through the various RePEc services.