IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2107.02780.html
   My bibliography  Save this paper

Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy

Author

Listed:
  • Anish Agarwal
  • Rahul Singh

Abstract

The US Census Bureau will deliberately corrupt data sets derived from the 2020 US Census, enhancing the privacy of respondents while potentially reducing the precision of economic analysis. To investigate whether this trade-off is inevitable, we formulate a semiparametric model of causal inference with high dimensional corrupted data. We propose a procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency and Gaussian approximation by finite sample arguments, with a rate of $n^{ 1/2}$ for semiparametric estimands that degrades gracefully for nonparametric estimands. Our key assumption is that the true covariates are approximately low rank, which we interpret as approximate repeated measurements and empirically validate. Our analysis provides nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. Calibrated simulations verify the coverage of our data cleaning adjusted confidence intervals and demonstrate the relevance of our results for Census-derived data.

Suggested Citation

  • Anish Agarwal & Rahul Singh, 2021. "Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy," Papers 2107.02780, arXiv.org, revised Feb 2024.
  • Handle: RePEc:arx:papers:2107.02780
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2107.02780
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Evdokimov, Kirill & White, Halbert, 2012. "Some Extensions Of A Lemma Of Kotlarski," Econometric Theory, Cambridge University Press, vol. 28(4), pages 925-932, August.
    2. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    3. Robinson, Peter M, 1988. "Root- N-Consistent Semiparametric Regression," Econometrica, Econometric Society, vol. 56(4), pages 931-954, July.
    4. Newey, Whitney K, 1994. "The Asymptotic Variance of Semiparametric Estimators," Econometrica, Econometric Society, vol. 62(6), pages 1349-1382, November.
    5. Susanne M. Schennach, 2004. "Estimation of Nonlinear Models with Measurement Error," Econometrica, Econometric Society, vol. 72(1), pages 33-75, January.
    6. S. M. Schennach & Yingyao Hu, 2013. "Nonparametric Identification and Semiparametric Estimation of Classical Measurement Error Models Without Side Information," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(501), pages 177-186, March.
    7. Victor Chernozhukov & Juan Carlos Escanciano & Hidehiko Ichimura & Whitney K. Newey & James M. Robins, 2022. "Locally Robust Semiparametric Estimation," Econometrica, Econometric Society, vol. 90(4), pages 1501-1535, July.
    8. Whitney K. Newey & Fushing Hsieh & James M. Robins, 2004. "Twicing Kernels and a Small Bias Property of Semiparametric Estimators," Econometrica, Econometric Society, vol. 72(3), pages 947-962, May.
    9. Anish Agarwal & Devavrat Shah & Dennis Shen & Dogyoon Song, 2021. "On Robustness of Principal Component Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1731-1745, October.
    10. Whitney K. Newey, 2001. "Flexible Simulated Moment Estimation Of Nonlinear Errors-In-Variables Models," The Review of Economics and Statistics, MIT Press, vol. 83(4), pages 616-627, November.
    11. Li, Tong & Vuong, Quang, 1998. "Nonparametric Estimation of the Measurement Error Model Using Multiple Indicators," Journal of Multivariate Analysis, Elsevier, vol. 65(2), pages 139-165, May.
    12. van der Laan Mark J. & Rubin Daniel, 2006. "Targeted Maximum Likelihood Learning," The International Journal of Biostatistics, De Gruyter, vol. 2(1), pages 1-40, December.
    13. Yingyao Hu & Susanne M. Schennach, 2008. "Instrumental Variable Treatment of Nonclassical Measurement Error Models," Econometrica, Econometric Society, vol. 76(1), pages 195-216, January.
    14. Hausman, J. A. & Newey, W. K. & Powell, J. L., 1995. "Nonlinear errors in variables Estimation of some Engel curves," Journal of Econometrics, Elsevier, vol. 65(1), pages 205-233, January.
    15. Abadie, Alberto, 2003. "Semiparametric instrumental variable estimation of treatment response models," Journal of Econometrics, Elsevier, vol. 113(2), pages 231-263, April.
    16. Victor Chernozhukov & Whitney Newey & Rahul Singh & Vasilis Syrgkanis, 2020. "Adversarial Estimation of Riesz Representers," Papers 2101.00009, arXiv.org, revised Apr 2024.
    17. Chunrong Ai & Xiaohong Chen, 2003. "Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions," Econometrica, Econometric Society, vol. 71(6), pages 1795-1843, November.
    18. Li, Tong, 2002. "Robust and consistent estimation of nonlinear errors-in-variables models," Journal of Econometrics, Elsevier, vol. 110(1), pages 1-26, September.
    19. Victor Chernozhukov & Kaspar Wuthrich & Yinchu Zhu, 2018. "A $t$-test for synthetic controls," Papers 1812.10820, arXiv.org, revised Jan 2024.
    20. Wang, Liqun & Hsiao, Cheng, 2011. "Method of moments estimation and identifiability of semiparametric nonlinear errors-in-variables models," Journal of Econometrics, Elsevier, vol. 165(1), pages 30-44.
    21. Hausman, Jerry A. & Newey, Whitney K. & Ichimura, Hidehiko & Powell, James L., 1991. "Identification and estimation of polynomial errors-in-variables models," Journal of Econometrics, Elsevier, vol. 50(3), pages 273-295, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Anish Agarwal & Munther Dahleh & Devavrat Shah & Dennis Shen, 2021. "Causal Matrix Completion," Papers 2109.15154, arXiv.org.
    2. Isaac Meza & Rahul Singh, 2021. "Nested Nonparametric Instrumental Variable Regression: Long Term, Mediated, and Time Varying Treatment Effects," Papers 2112.14249, arXiv.org, revised Mar 2024.
    3. Fengshi Niu & Harsha Nori & Brian Quistorff & Rich Caruana & Donald Ngwe & Aadharsh Kannan, 2022. "Differentially Private Estimation of Heterogeneous Causal Effects," Papers 2202.11043, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Song, Suyong, 2015. "Semiparametric estimation of models with conditional moment restrictions in the presence of nonclassical measurement errors," Journal of Econometrics, Elsevier, vol. 185(1), pages 95-109.
    2. Qizhao Chen & Vasilis Syrgkanis & Morgane Austern, 2022. "Debiased Machine Learning without Sample-Splitting for Stable Estimators," Papers 2206.01825, arXiv.org, revised Nov 2022.
    3. Susanne M. Schennach, 2012. "Measurement error in nonlinear models - a review," CeMMAP working papers 41/12, Institute for Fiscal Studies.
    4. Isaac Meza & Rahul Singh, 2021. "Nested Nonparametric Instrumental Variable Regression: Long Term, Mediated, and Time Varying Treatment Effects," Papers 2112.14249, arXiv.org, revised Mar 2024.
    5. Rahul Singh, 2021. "Kernel Ridge Riesz Representers: Generalization, Mis-specification, and the Counterfactual Effective Dimension," Papers 2102.11076, arXiv.org, revised Jul 2024.
    6. Yingyao Hu & Geert Ridder, 2012. "Estimation of nonlinear models with mismeasured regressors using marginal information," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 27(3), pages 347-385, April.
    7. Xiaohong Chen & Yingyao Hu, 2006. "Identification and Inference of Nonlinear Models Using Two Samples with Arbitrary Measurement Errors," Cowles Foundation Discussion Papers 1590, Cowles Foundation for Research in Economics, Yale University.
    8. Victor Chernozhukov & Whitney Newey & Rahul Singh & Vasilis Syrgkanis, 2020. "Adversarial Estimation of Riesz Representers," Papers 2101.00009, arXiv.org, revised Apr 2024.
    9. Victor Chernozhukov & Juan Carlos Escanciano & Hidehiko Ichimura & Whitney K. Newey & James M. Robins, 2022. "Locally Robust Semiparametric Estimation," Econometrica, Econometric Society, vol. 90(4), pages 1501-1535, July.
    10. Geert Ridder & Yingyao Hu, 2004. "Estimation of Nonlinear Models with Measurement Error Using Marginal Information," Econometric Society 2004 North American Summer Meetings 21, Econometric Society.
    11. V Chernozhukov & W K Newey & R Singh, 2023. "A simple and general debiased machine learning theorem with finite-sample guarantees," Biometrika, Biometrika Trust, vol. 110(1), pages 257-264.
    12. De Nadai, Michele & Lewbel, Arthur, 2016. "Nonparametric errors in variables models with measurement errors on both sides of the equation," Journal of Econometrics, Elsevier, vol. 191(1), pages 19-32.
    13. Mochen Yang & Edward McFowland & Gordon Burtch & Gediminas Adomavicius, 2022. "Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem," INFORMS Joural on Data Science, INFORMS, vol. 1(2), pages 138-155, October.
    14. Andrei Zeleneev & Kirill Evdokimov, 2023. "Simple estimation of semiparametric models with measurement errors," CeMMAP working papers 10/23, Institute for Fiscal Studies.
    15. Kirill S. Evdokimov & Andrei Zeleneev, 2023. "Simple Estimation of Semiparametric Models with Measurement Errors," Papers 2306.14311, arXiv.org, revised Mar 2024.
    16. Hidehiko Ichimura & Whitney K. Newey, 2022. "The influence function of semiparametric estimators," Quantitative Economics, Econometric Society, vol. 13(1), pages 29-61, January.
    17. Hu, Yingyao, 2008. "Identification and estimation of nonlinear models with misclassification error using instrumental variables: A general solution," Journal of Econometrics, Elsevier, vol. 144(1), pages 27-61, May.
    18. Wang, Liqun & Hsiao, Cheng, 2011. "Method of moments estimation and identifiability of semiparametric nonlinear errors-in-variables models," Journal of Econometrics, Elsevier, vol. 165(1), pages 30-44.
    19. Susanne M. Schennach, 2013. "Regressions with Berkson errors in covariates - A nonparametric approach," Papers 1308.2836, arXiv.org.
    20. Xiaohong Chen & Han Hong & Denis Nekipelov, 2011. "Nonlinear Models of Measurement Errors," Journal of Economic Literature, American Economic Association, vol. 49(4), pages 901-937, December.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2107.02780. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.