IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/115014.html
   My bibliography  Save this paper

High-dimensional changepoint estimation with heterogeneous missingness

Author

Listed:
  • Follain, Bertille
  • Wang, Tengyao
  • Samworth, Richard J.

Abstract

We propose a new method for changepoint estimation in partially observed, high-dimensional time series that undergo a simultaneous change in mean in a sparse subset of coordinates. Our first methodological contribution is to introduce a ‘MissCUSUM’ transformation (a generalisation of the popular cumulative sum statistics), that captures the interaction between the signal strength and the level of missingness in each coordinate. In order to borrow strength across the coordinates, we propose to project these MissCUSUM statistics along a direction found as the solution to a penalised optimisation problem tailored to the specific sparsity structure. The changepoint can then be estimated as the location of the peak of the absolute value of the projected univariate series. In a model that allows different missingness probabilities in different component series, we identify that the key interaction between the missingness and the signal is a weighted sum of squares of the signal change in each coordinate, with weights given by the observation probabilities. More specifically, we prove that the angle between the estimated and oracle projection directions, as well as the changepoint location error, are controlled with high probability by the sum of two terms, both involving this weighted sum of squares, and representing the error incurred due to noise and the error due to missingness respectively. A lower bound confirms that our changepoint estimator, which we call MissInspect, is optimal up to a logarithmic factor. The striking effectiveness of the MissInspect methodology is further demonstrated both on simulated data, and on an oceanographic data set covering the Neogene period.

Suggested Citation

  • Follain, Bertille & Wang, Tengyao & Samworth, Richard J., 2022. "High-dimensional changepoint estimation with heterogeneous missingness," LSE Research Online Documents on Economics 115014, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:115014
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/115014/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yudong Chen & Tengyao Wang & Richard J. Samworth, 2022. "High‐dimensional, multiscale online changepoint detection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 234-266, February.
    2. Gaucher, Solenne & Klopp, Olga & Robin, Geneviève, 2021. "Outlier detection in networks with missing links," Computational Statistics & Data Analysis, Elsevier, vol. 164(C).
    3. Bai, Jushan, 2010. "Common breaks in means and variances for panel data," Journal of Econometrics, Elsevier, vol. 157(1), pages 78-92, July.
    4. Chen, Yudong & Wang, Tengyao & Samworth, Richard J., 2022. "High-dimensional, multiscale online changepoint detection," LSE Research Online Documents on Economics 113665, London School of Economics and Political Science, LSE Library.
    5. Lajos Horváth & Gregory Rice, 2014. "Extensions of some classical methods in change point analysis," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 23(2), pages 219-255, June.
    6. Madeleine Cule & Richard Samworth & Michael Stewart, 2010. "Maximum likelihood estimation of a multi‐dimensional log‐concave density," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(5), pages 545-607, November.
    7. Ross Sparks & Tim Keighley & David Muscatello, 2010. "Early warning CUSUM plans for surveillance of negative binomial daily disease counts," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(11), pages 1911-1929.
    8. Nancy R. Zhang & David O. Siegmund & Hanlee Ji & Jun Z. Li, 2010. "Detecting simultaneous changepoints in multiple sequences," Biometrika, Biometrika Trust, vol. 97(3), pages 631-645.
    9. Tengyao Wang & Richard J. Samworth, 2018. "High dimensional change point estimation via sparse projection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(1), pages 57-83, January.
    10. Y. Mei, 2010. "Efficient scalable schemes for monitoring a large number of data streams," Biometrika, Biometrika Trust, vol. 97(2), pages 419-433.
    11. Fryzlewicz, Piotr, 2014. "Wild binary segmentation for multiple change-point detection," LSE Research Online Documents on Economics 57146, London School of Economics and Political Science, LSE Library.
    12. Lajos Horváth & Gregory Rice, 2014. "Rejoinder on: Extensions of some classical methods in change point analysis," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 23(2), pages 287-290, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bertille Follain & Tengyao Wang & Richard J. Samworth, 2022. "High‐dimensional changepoint estimation with heterogeneous missingness," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 1023-1055, July.
    2. Liu, Bin & Zhang, Xinsheng & Liu, Yufeng, 2022. "High dimensional change point inference: Recent developments and extensions," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    3. Jiang, Feiyu & Wang, Runmin & Shao, Xiaofeng, 2023. "Robust inference for change points in high dimension," Journal of Multivariate Analysis, Elsevier, vol. 193(C).
    4. Cho, Haeran & Kirch, Claudia, 2024. "Data segmentation algorithms: Univariate mean change and beyond," Econometrics and Statistics, Elsevier, vol. 30(C), pages 76-95.
    5. Haeran Cho & Claudia Kirch, 2022. "Two-stage data segmentation permitting multiscale change points, heavy tails and dependence," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(4), pages 653-684, August.
    6. Yudong Chen & Tengyao Wang & Richard J. Samworth, 2022. "High‐dimensional, multiscale online changepoint detection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 234-266, February.
    7. Chen, Yudong & Wang, Tengyao & Samworth, Richard J., 2022. "High-dimensional, multiscale online changepoint detection," LSE Research Online Documents on Economics 113665, London School of Economics and Political Science, LSE Library.
    8. Cui, Junfeng & Wang, Guanghui & Zou, Changliang & Wang, Zhaojun, 2023. "Change-point testing for parallel data sets with FDR control," Computational Statistics & Data Analysis, Elsevier, vol. 182(C).
    9. Hahn, Georg, 2022. "Online multivariate changepoint detection with type I error control and constant time/memory updates per series," Statistics & Probability Letters, Elsevier, vol. 181(C).
    10. Stergios B. Fotopoulos & Abhishek Kaul & Vasileios Pavlopoulos & Venkata K. Jandhyala, 2024. "Adaptive parametric change point inference under covariance structure changes," Statistical Papers, Springer, vol. 65(5), pages 2887-2913, July.
    11. Chen, Likai & Wang, Weining & Wu, Wei Biao, 2019. "Inference of Break-Points in High-Dimensional Time Series," IRTG 1792 Discussion Papers 2019-013, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    12. Kleiber, Christian, 2016. "Structural Change in (Economic) Time Series," Working papers 2016/06, Faculty of Business and Economics - University of Basel.
    13. Jaromír Antoch & Jan Hanousek & Lajos Horváth & Marie Hušková & Shixuan Wang, 2019. "Structural breaks in panel data: Large number of panels and short length time series," Econometric Reviews, Taylor & Francis Journals, vol. 38(7), pages 828-855, August.
    14. Mengjia Yu & Xiaohui Chen, 2021. "Finite sample change point inference and identification for high‐dimensional mean vectors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(2), pages 247-270, April.
    15. Horváth, Lajos & Rice, Gregory & Zhao, Yuqian, 2023. "Testing for changes in linear models using weighted residuals," Journal of Multivariate Analysis, Elsevier, vol. 198(C).
    16. Yu Jeffrey Hu & Jeroen Rombouts & Ines Wilms, 2023. "Fast Forecasting of Unstable Data Streams for On-Demand Service Platforms," Papers 2303.01887, arXiv.org, revised May 2024.
    17. Claudia Kirch & Christina Stoehr, 2022. "Sequential change point tests based on U‐statistics," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(3), pages 1184-1214, September.
    18. Bouzebda, Salim & Ferfache, Anouar Abdeldjaoued, 2023. "Asymptotic properties of semiparametric M-estimators with multiple change points," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 609(C).
    19. Horváth, Lajos & Rice, Gregory & Zhao, Yuqian, 2022. "Change point analysis of covariance functions: A weighted cumulative sum approach," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    20. Tuomas Rajala & Petteri Packalen & Mari Myllymäki & Annika Kangas, 2023. "Improving Detection of Changepoints in Short and Noisy Time Series with Local Correlations: Connecting the Events in Pixel Neighbourhoods," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 28(3), pages 564-590, September.

    More about this item

    Keywords

    changepoint estimation; missing data; high-dimensional data; segmentation; sparsity;
    All these keywords.

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:115014. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.