IDEAS home Printed from https://ideas.repec.org/a/taf/jnlasa/v116y2021i536p1731-1745.html
   My bibliography  Save this article

On Robustness of Principal Component Regression

Author

Listed:
  • Anish Agarwal
  • Devavrat Shah
  • Dennis Shen
  • Dogyoon Song

Abstract

Principal component regression (PCR) is a simple, but powerful and ubiquitously utilized method. Its effectiveness is well established when the covariates exhibit low-rank structure. However, its ability to handle settings with noisy, missing, and mixed-valued, that is, discrete and continuous, covariates is not understood and remains an important open challenge. As the main contribution of this work, we establish the robustness of PCR, without any change, in this respect and provide meaningful finite-sample analysis. To do so, we establish that PCR is equivalent to performing linear regression after preprocessing the covariate matrix via hard singular value thresholding (HSVT). As a result, in the context of counterfactual analysis using observational data, we show PCR is equivalent to the recently proposed robust variant of the synthetic control method, known as robust synthetic control (RSC). As an immediate consequence, we obtain finite-sample analysis of the RSC estimator that was previously absent. As an important contribution to the synthetic controls literature, we establish that an (approximate) linear synthetic control exists in the setting of a generalized factor model, or latent variable model; traditionally in the literature, the existence of a synthetic control needs to be assumed to exist as an axiom. We further discuss a surprising implication of the robustness property of PCR with respect to noise, that is, PCR can learn a good predictive model even if the covariates are tactfully transformed to preserve differential privacy. Finally, this work advances the state-of-the-art analysis for HSVT by establishing stronger guarantees with respect to the l2,∞ -norm rather than the Frobenius norm as is commonly done in the matrix estimation literature, which may be of interest in its own right.

Suggested Citation

  • Anish Agarwal & Devavrat Shah & Dennis Shen & Dogyoon Song, 2021. "On Robustness of Principal Component Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 1731-1745, October.
  • Handle: RePEc:taf:jnlasa:v:116:y:2021:i:536:p:1731-1745
    DOI: 10.1080/01621459.2021.1928513
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/01621459.2021.1928513
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/01621459.2021.1928513?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zongwu Cai & Ying Fang & Ming Lin & Zixuan Wu, 2023. "A Quasi Synthetic Control Method for Nonlinear Models With High-Dimensional Covariates," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 202305, University of Kansas, Department of Economics, revised Aug 2023.
    2. Anish Agarwal & Vasilis Syrgkanis, 2022. "Synthetic Blip Effects: Generalizing Synthetic Controls for the Dynamic Treatment Regime," Papers 2210.11003, arXiv.org.
    3. Adam F. Sapnik & Irene Bechis & Alice M. Bumstead & Timothy Johnson & Philip A. Chater & David A. Keen & Kim E. Jelfs & Thomas D. Bennett, 2022. "Multivariate analysis of disorder in metal–organic frameworks," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    4. Bernardo García Bulle & Dennis Shen & Devavrat Shah & Anette E. Hosoi, 2022. "Public health implications of opening National Football League stadiums during the COVID-19 pandemic," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 119(14), pages 2114226119-, April.
    5. Dennis Shen & Peng Ding & Jasjeet Sekhon & Bin Yu, 2022. "Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data," Papers 2207.14481, arXiv.org, revised Oct 2022.
    6. Anish Agarwal & Keegan Harris & Justin Whitehouse & Zhiwei Steven Wu, 2023. "Adaptive Principal Component Regression with Applications to Panel Data," Papers 2307.01357, arXiv.org, revised Aug 2024.
    7. Levenko, Natalia & Staehr, Karsten, 2023. "Self-reported tax compliance in post-transition Estonia," Economic Systems, Elsevier, vol. 47(3).
    8. Angela Zhou & Andrew Koo & Nathan Kallus & Rene Ropac & Richard Peterson & Stephen Koppel & Tiffany Bergin, 2021. "An Empirical Evaluation of the Impact of New York's Bail Reform on Crime Using Synthetic Controls," Papers 2111.08664, arXiv.org, revised Jun 2023.
    9. Anish Agarwal & Rahul Singh, 2021. "Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy," Papers 2107.02780, arXiv.org, revised Feb 2024.
    10. Alberto Abadie & Anish Agarwal & Raaz Dwivedi & Abhin Shah, 2024. "Doubly Robust Inference in Causal Latent Factor Models," Papers 2402.11652, arXiv.org, revised Oct 2024.
    11. Vivek F. Farias & Andrew A. Li & Tianyi Peng, 2021. "Learning Treatment Effects in Panels with General Intervention Patterns," Papers 2106.02780, arXiv.org, revised Mar 2023.
    12. Luis Costa & Vivek F. Farias & Patricio Foncea & Jingyuan (Donna) Gan & Ayush Garg & Ivo Rosa Montenegro & Kumarjit Pathak & Tianyi Peng & Dusan Popovic, 2023. "Generalized Synthetic Control for TestOps at ABI: Models, Algorithms, and Infrastructure," Interfaces, INFORMS, vol. 53(5), pages 336-349, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:jnlasa:v:116:y:2021:i:536:p:1731-1745. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UASA20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.