IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v27y2012i4p715-737.html
   My bibliography  Save this article

Flexible scan statistic test to detect disease clusters in hierarchical trees

Author

Listed:
  • Marcos Prates
  • Renato Assunção
  • Marcelo Costa

Abstract

This paper presents a flexible scan test statistic to detect disease clusters in data sets represented as a hierarchical tree. The algorithm searches through the branches of the tree and it is able to aggregate leaves located in different branches. The test statistic combines two terms, the log-likelihood of the data and the amount of information necessary to computationally code each potential cluster. This second term penalizes the search algorithm avoiding the detection of oddly shaped clusters and it is based on the Minimum Description Length (MDL) principle. Our MDL method reaches an automatic compromise between bias and variance. We present simulated results showing that its power performance as compared to the usual scan statistic and the high accuracy of the MDL to identify clusters that are scattered on the tree. The MDL method is illustrated with a large database looking at the relationship between occupation and death from silicosis. Copyright Springer-Verlag 2012

Suggested Citation

  • Marcos Prates & Renato Assunção & Marcelo Costa, 2012. "Flexible scan statistic test to detect disease clusters in hierarchical trees," Computational Statistics, Springer, vol. 27(4), pages 715-737, December.
  • Handle: RePEc:spr:compst:v:27:y:2012:i:4:p:715-737
    DOI: 10.1007/s00180-011-0286-9
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00180-011-0286-9
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00180-011-0286-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hansen M. H & Yu B., 2001. "Model Selection and the Principle of Minimum Description Length," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 746-774, June.
    2. Davis, Richard A. & Lee, Thomas C.M. & Rodriguez-Yam, Gabriel A., 2006. "Structural Break Estimation for Nonstationary Time Series Models," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 223-239, March.
    3. Martin Kulldorff & Zixing Fang & Stephen J Walsh, 2003. "A Tree-Based Scan Statistic for Database Disease Surveillance," Biometrics, The International Biometric Society, vol. 59(2), pages 323-331, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Linda Mhalla & Valérie Chavez‐Demoulin & Debbie J. Dupuis, 2020. "Causal mechanism of extreme river discharges in the upper Danube basin network," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(4), pages 741-764, August.
    2. Fryzlewicz, Piotr & Nason, Guy P., 2006. "Haar-Fisz estimation of evolutionary wavelet spectra," LSE Research Online Documents on Economics 25227, London School of Economics and Political Science, LSE Library.
    3. Seongkyoon Jeong & Jae Young Choi, 2012. "The taxonomy of research collaboration in science and technology: evidence from mechanical research through probabilistic clustering analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(3), pages 719-735, June.
    4. Francesco Battaglia & Mattheos K. Protopapas, 2010. "Multi-regime models for nonlinear nonstationary time series," Working Papers 026, COMISEF.
    5. Boldea, Otilia & Hall, Alastair R., 2013. "Estimation and inference in unstable nonlinear least squares models," Journal of Econometrics, Elsevier, vol. 172(1), pages 158-167.
    6. Poskitt, D.S. & Sengarapillai, Arivalzahan, 2013. "Description length and dimensionality reduction in functional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 58(C), pages 98-113.
    7. Siem Jan Koopman & Soon Yip Wong, 2006. "Extracting Business Cycles using Semi-parametric Time-varying Spectra with Applications to US Macroeconomic Time Series," Tinbergen Institute Discussion Papers 06-105/4, Tinbergen Institute.
    8. Domenico Cucina & Manuel Rizzo & Eugen Ursu, 2018. "Identification of multiregime periodic autotregressive models by genetic algorithms," Post-Print hal-03187870, HAL.
    9. Rissanen, Jorma & Roos, Teemu & Myllymäki, Petri, 2010. "Model selection by sequentially normalized least squares," Journal of Multivariate Analysis, Elsevier, vol. 101(4), pages 839-849, April.
    10. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    11. Wei Qian & Craig A. Rolling & Gang Cheng & Yuhong Yang, 2015. "On the Forecast Combination Puzzle," Papers 1505.00475, arXiv.org.
    12. Francesco Battaglia & Mattheos Protopapas, 2012. "An analysis of global warming in the Alpine region based on nonlinear nonstationary time series models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 21(3), pages 315-334, August.
    13. Davis, Richard A. & Hancock, Stacey A. & Yao, Yi-Ching, 2016. "On consistency of minimum description length model selection for piecewise autoregressions," Journal of Econometrics, Elsevier, vol. 194(2), pages 360-368.
    14. Ngai Hang Chan & Chun Yip Yau & Rong-Mao Zhang, 2014. "Group LASSO for Structural Break Time Series," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 590-599, June.
    15. Chen, Zhanshou & Xu, Qiongyao & Li, Huini, 2019. "Inference for multiple change points in heavy-tailed time series via rank likelihood ratio scan statistics," Economics Letters, Elsevier, vol. 179(C), pages 53-56.
    16. Kurozumi, Eiji & Tuvaandorj, Purevdorj, 2011. "Model selection criteria in multivariate models with multiple structural changes," Journal of Econometrics, Elsevier, vol. 164(2), pages 218-238, October.
    17. Chun Yip Yau & Chong Man Tang & Thomas C. M. Lee, 2015. "Estimation of Multiple-Regime Threshold Autoregressive Models With Structural Breaks," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 1175-1186, September.
    18. Fryzlewicz, Piotr, 2020. "Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection," LSE Research Online Documents on Economics 103430, London School of Economics and Political Science, LSE Library.
    19. Fontaine, Charles & Frostig, Ron D. & Ombao, Hernando, 2020. "Modeling non-linear spectral domain dependence using copulas with applications to rat local field potentials," Econometrics and Statistics, Elsevier, vol. 15(C), pages 85-103.
    20. Chan, Ngai Hang & Yau, Chun Yip & Zhang, Rong-Mao, 2015. "LASSO estimation of threshold autoregressive models," Journal of Econometrics, Elsevier, vol. 189(2), pages 285-296.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:27:y:2012:i:4:p:715-737. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.