IDEAS home Printed from https://ideas.repec.org/a/spr/stabio/v13y2021i1d10.1007_s12561-020-09278-z.html
   My bibliography  Save this article

A Super Scalable Algorithm for Short Segment Detection

Author

Listed:
  • Ning Hao

    (University of Arizona)

  • Yue Selena Niu

    (University of Arizona)

  • Feifei Xiao

    (University of South Carolina)

  • Heping Zhang

    (Yale School of Public Health)

Abstract

In many applications such as copy number variant (CNV) detection, the goal is to identify short segments on which the observations have different means or medians from the background. Those segments are usually short and hidden in a long sequence and hence are very challenging to find. We study a super scalable short segment (4S) detection algorithm in this paper. This nonparametric method clusters the locations where the observations exceed a threshold for segment detection. It is computationally efficient and does not rely on Gaussian noise assumption. Moreover, we develop a framework to assign significance levels for detected segments. We demonstrate the advantages of our proposed method by theoretical, simulation, and real data studies.

Suggested Citation

  • Ning Hao & Yue Selena Niu & Feifei Xiao & Heping Zhang, 2021. "A Super Scalable Algorithm for Short Segment Detection," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(1), pages 18-33, April.
  • Handle: RePEc:spr:stabio:v:13:y:2021:i:1:d:10.1007_s12561-020-09278-z
    DOI: 10.1007/s12561-020-09278-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12561-020-09278-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12561-020-09278-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Klaus Frick & Axel Munk & Hannes Sieling, 2014. "Multiscale change point inference," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(3), pages 495-580, June.
    2. Jeng, X. Jessie & Cai, T. Tony & Li, Hongzhe, 2010. "Optimal Sparse Segment Identification With Application in Copy Number Variation Analysis," Journal of the American Statistical Association, American Statistical Association, vol. 105(491), pages 1156-1166.
    3. Fryzlewicz, Piotr, 2014. "Wild binary segmentation for multiple change-point detection," LSE Research Online Documents on Economics 57146, London School of Economics and Political Science, LSE Library.
    4. T. Tony Cai & X. Jessie Jeng & Hongzhe Li, 2012. "Robust detection and identification of sparse segments in ultrahigh dimensional data analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(5), pages 773-797, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Florian Pein & Hannes Sieling & Axel Munk, 2017. "Heterogeneous change point inference," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1207-1227, September.
    2. Bill Russell & Dooruj Rambaccussing, 2019. "Breaks and the statistical process of inflation: the case of estimating the ‘modern’ long-run Phillips curve," Empirical Economics, Springer, vol. 56(5), pages 1455-1475, May.
    3. Michael Messer, 2022. "Bivariate change point detection: Joint detection of changes in expectation and variance," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(2), pages 886-916, June.
    4. Wu Wang & Xuming He & Zhongyi Zhu, 2020. "Statistical inference for multiple change‐point models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(4), pages 1149-1170, December.
    5. Kang-Ping Lu & Shao-Tung Chang, 2021. "Robust Algorithms for Change-Point Regressions Using the t -Distribution," Mathematics, MDPI, vol. 9(19), pages 1-28, September.
    6. Andreas Anastasiou & Piotr Fryzlewicz, 2022. "Detecting multiple generalized change-points by isolating single ones," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 85(2), pages 141-174, February.
    7. Stefan Albert & Michael Messer & Julia Schiemann & Jochen Roeper & Gaby Schneider, 2017. "Multi-Scale Detection of Variance Changes in Renewal Processes in the Presence of Rate Change Points," Journal of Time Series Analysis, Wiley Blackwell, vol. 38(6), pages 1028-1052, November.
    8. Lu Shaochuan, 2023. "Scalable Bayesian Multiple Changepoint Detection via Auxiliary Uniformisation," International Statistical Review, International Statistical Institute, vol. 91(1), pages 88-113, April.
    9. Claudia Kirch & Christina Stoehr, 2022. "Sequential change point tests based on U‐statistics," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(3), pages 1184-1214, September.
    10. Kang-Ping Lu & Shao-Tung Chang, 2023. "An Advanced Segmentation Approach to Piecewise Regression Models," Mathematics, MDPI, vol. 11(24), pages 1-23, December.
    11. Tariku Tesfaye Haile & Fenglin Tian & Ghada AlNemer & Boping Tian, 2024. "Multiscale Change Point Detection for Univariate Time Series Data with Missing Value," Mathematics, MDPI, vol. 12(20), pages 1-22, October.
    12. Cho, Haeran & Kirch, Claudia, 2022. "Bootstrap confidence intervals for multiple change points based on moving sum procedures," Computational Statistics & Data Analysis, Elsevier, vol. 175(C).
    13. Sean Jewell & Paul Fearnhead & Daniela Witten, 2022. "Testing for a change in mean after changepoint detection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1082-1104, September.
    14. Cho, Haeran & Kirch, Claudia, 2024. "Data segmentation algorithms: Univariate mean change and beyond," Econometrics and Statistics, Elsevier, vol. 30(C), pages 76-95.
    15. Mohamed Salah Eddine Arrouch & Echarif Elharfaoui & Joseph Ngatchou-Wandji, 2023. "Change-Point Detection in the Volatility of Conditional Heteroscedastic Autoregressive Nonlinear Models," Mathematics, MDPI, vol. 11(18), pages 1-31, September.
    16. Michael Messer & Gaby Schneider, 2017. "The shark fin function: asymptotic behavior of the filtered derivative for point processes in case of change points," Statistical Inference for Stochastic Processes, Springer, vol. 20(2), pages 253-272, July.
    17. S Kovács & P Bühlmann & H Li & A Munk, 2023. "Seeded binary segmentation: a general methodology for fast and optimal changepoint detection," Biometrika, Biometrika Trust, vol. 110(1), pages 249-256.
    18. Haeran Cho & Claudia Kirch, 2022. "Two-stage data segmentation permitting multiscale change points, heavy tails and dependence," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(4), pages 653-684, August.
    19. McGonigle, Euan T. & Cho, Haeran, 2023. "Robust multiscale estimation of time-average variance for time series segmentation," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    20. Zifeng Zhao & Feiyu Jiang & Xiaofeng Shao, 2022. "Segmenting time series via self‐normalisation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1699-1725, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stabio:v:13:y:2021:i:1:d:10.1007_s12561-020-09278-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.