IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2205.14284.html
   My bibliography  Save this paper

Provably Auditing Ordinary Least Squares in Low Dimensions

Author

Listed:
  • Ankur Moitra
  • Dhruv Rohatgi

Abstract

Measuring the stability of conclusions derived from Ordinary Least Squares linear regression is critically important, but most metrics either only measure local stability (i.e. against infinitesimal changes in the data), or are only interpretable under statistical assumptions. Recent work proposes a simple, global, finite-sample stability metric: the minimum number of samples that need to be removed so that rerunning the analysis overturns the conclusion, specifically meaning that the sign of a particular coefficient of the estimated regressor changes. However, besides the trivial exponential-time algorithm, the only approach for computing this metric is a greedy heuristic that lacks provable guarantees under reasonable, verifiable assumptions; the heuristic provides a loose upper bound on the stability and also cannot certify lower bounds on it. We show that in the low-dimensional regime where the number of covariates is a constant but the number of samples is large, there are efficient algorithms for provably estimating (a fractional version of) this metric. Applying our algorithms to the Boston Housing dataset, we exhibit regression analyses where we can estimate the stability up to a factor of $3$ better than the greedy heuristic, and analyses where we can certify stability to dropping even a majority of the samples.

Suggested Citation

  • Ankur Moitra & Dhruv Rohatgi, 2022. "Provably Auditing Ordinary Least Squares in Low Dimensions," Papers 2205.14284, arXiv.org, revised Jun 2022.
  • Handle: RePEc:arx:papers:2205.14284
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2205.14284
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. John P A Ioannidis, 2005. "Why Most Published Research Findings Are False," PLOS Medicine, Public Library of Science, vol. 2(8), pages 1-1, August.
    2. Harrison, David Jr. & Rubinfeld, Daniel L., 1978. "Hedonic housing prices and the demand for clean air," Journal of Environmental Economics and Management, Elsevier, vol. 5(1), pages 81-102, March.
    3. Gilley, Otis W. & Pace, R. Kelley, 1996. "On the Harrison and Rubinfeld Data," Journal of Environmental Economics and Management, Elsevier, vol. 31(3), pages 403-405, November.
    4. Tanaka, Hideo & Hayashi, Isao & Watada, Junzo, 1989. "Possibilistic linear regression analysis for fuzzy data," European Journal of Operational Research, Elsevier, vol. 40(3), pages 389-396, June.
    5. Card, David, 2001. "Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems," Econometrica, Econometric Society, vol. 69(5), pages 1127-1160, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Daniel Freund & Samuel B. Hopkins, 2023. "Towards Practical Robustness Auditing for Linear Regression," Papers 2307.16315, arXiv.org.
    2. Gabriel Okasa & Kenneth A. Younge, 2022. "Sample Fit Reliability," Papers 2209.06631, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cheng, Tsung-Chi, 2012. "On simultaneously identifying outliers and heteroscedasticity without specific form," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2258-2272.
    2. Bodhisattva Sen & Mary Meyer, 2017. "Testing against a linear regression model using ideas from shape-restricted estimation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(2), pages 423-448, March.
    3. Takafumi Kato, 2020. "Likelihood-based strategies for estimating unknown parameters and predicting missing data in the simultaneous autoregressive model," Journal of Geographical Systems, Springer, vol. 22(1), pages 143-176, January.
    4. James P. LeSage & R. Kelley Pace, 2014. "The Biggest Myth in Spatial Econometrics," Econometrics, MDPI, vol. 2(4), pages 1-33, December.
    5. Simlai, Prodosh, 2014. "Estimation of variance of housing prices using spatial conditional heteroskedasticity (SARCH) model with an application to Boston housing price data," The Quarterly Review of Economics and Finance, Elsevier, vol. 54(1), pages 17-30.
    6. Xiaowen Dai & Libin Jin & Anqi Shi & Lei Shi, 2016. "Outlier detection and accommodation in general spatial models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 25(3), pages 453-475, August.
    7. James P. LeSage & R. Kelley Pace, 2018. "Spatial econometric Monte Carlo studies: raising the bar," Empirical Economics, Springer, vol. 55(1), pages 17-34, August.
    8. Wei, Chuanhua & Guo, Shuang & Zhai, Shufen, 2017. "Statistical inference of partially linear varying coefficient spatial autoregressive models," Economic Modelling, Elsevier, vol. 64(C), pages 553-559.
    9. Marwan Al-Momani & Mohammad Arashi, 2024. "Ridge-Type Pretest and Shrinkage Estimation Strategies in Spatial Error Models with an Application to a Real Data Example," Mathematics, MDPI, vol. 12(3), pages 1-19, January.
    10. Takafumi Kato, 2013. "Usefulness of the Information Contained in the Prediction Sample for the Spatial Error Model," The Journal of Real Estate Finance and Economics, Springer, vol. 47(1), pages 169-195, July.
    11. Maia, Mateus & Murphy, Keefe & Parnell, Andrew C., 2024. "GP-BART: A novel Bayesian additive regression trees approach using Gaussian processes," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).
    12. Malikov, Emir & Sun, Yiguo, 2017. "Semiparametric estimation and testing of smooth coefficient spatial autoregressive models," Journal of Econometrics, Elsevier, vol. 199(1), pages 12-34.
    13. Doğan, Osman & Taşpınar, Süleyman, 2014. "Spatial autoregressive models with unknown heteroskedasticity: A comparison of Bayesian and robust GMM approach," Regional Science and Urban Economics, Elsevier, vol. 45(C), pages 1-21.
    14. Rossi, Francesca & Lieberman, Offer, 2023. "Spatial autoregressions with an extended parameter space and similarity-based weights," Journal of Econometrics, Elsevier, vol. 235(2), pages 1770-1798.
    15. Yunquan Song & Minmin Zhan & Yue Zhang & Yongxin Liu, 2024. "Huber Loss Meets Spatial Autoregressive Model: A Robust Variable Selection Method with Prior Information," Networks and Spatial Economics, Springer, vol. 24(1), pages 291-311, March.
    16. Seya, Hajime & Yamagata, Yoshiki & Tsutsumi, Morito, 2013. "Automatic selection of a spatial weight matrix in spatial econometrics: Application to a spatial hedonic approach," Regional Science and Urban Economics, Elsevier, vol. 43(3), pages 429-444.
    17. Harold Alderman & John Hoddinott & Bill Kinsey, 2006. "Long term consequences of early childhood malnutrition," Oxford Economic Papers, Oxford University Press, vol. 58(3), pages 450-474, July.
    18. Kristinn Hermannsson & Patrizio Lecca, 2016. "Human Capital in Economic Development: From Labour Productivity to Macroeconomic Impact," Economic Papers, The Economic Society of Australia, vol. 35(1), pages 24-36, March.
    19. Lucija Muehlenbachs & Elisheba Spiller & Christopher Timmins, 2015. "The Housing Market Impacts of Shale Gas Development," American Economic Review, American Economic Association, vol. 105(12), pages 3633-3659, December.
    20. María Arrazola & José de Hevia, 2003. "Evaluación económica de políticas educativas: Una ilustración con la Ley General de la Educación de 1970," Hacienda Pública Española / Review of Public Economics, IEF, vol. 164(1), pages 111-127, march.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2205.14284. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.