IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v27y2012i4p715-737.html
   My bibliography  Save this article

Flexible scan statistic test to detect disease clusters in hierarchical trees

Author

Listed:
  • Marcos Prates
  • Renato Assunção
  • Marcelo Costa

Abstract

This paper presents a flexible scan test statistic to detect disease clusters in data sets represented as a hierarchical tree. The algorithm searches through the branches of the tree and it is able to aggregate leaves located in different branches. The test statistic combines two terms, the log-likelihood of the data and the amount of information necessary to computationally code each potential cluster. This second term penalizes the search algorithm avoiding the detection of oddly shaped clusters and it is based on the Minimum Description Length (MDL) principle. Our MDL method reaches an automatic compromise between bias and variance. We present simulated results showing that its power performance as compared to the usual scan statistic and the high accuracy of the MDL to identify clusters that are scattered on the tree. The MDL method is illustrated with a large database looking at the relationship between occupation and death from silicosis. Copyright Springer-Verlag 2012

Suggested Citation

  • Marcos Prates & Renato Assunção & Marcelo Costa, 2012. "Flexible scan statistic test to detect disease clusters in hierarchical trees," Computational Statistics, Springer, vol. 27(4), pages 715-737, December.
  • Handle: RePEc:spr:compst:v:27:y:2012:i:4:p:715-737
    DOI: 10.1007/s00180-011-0286-9
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00180-011-0286-9
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00180-011-0286-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Davis, Richard A. & Lee, Thomas C.M. & Rodriguez-Yam, Gabriel A., 2006. "Structural Break Estimation for Nonstationary Time Series Models," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 223-239, March.
    2. Hansen M. H & Yu B., 2001. "Model Selection and the Principle of Minimum Description Length," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 746-774, June.
    3. Martin Kulldorff & Zixing Fang & Stephen J Walsh, 2003. "A Tree-Based Scan Statistic for Database Disease Surveillance," Biometrics, The International Biometric Society, vol. 59(2), pages 323-331, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Linda Mhalla & Valérie Chavez‐Demoulin & Debbie J. Dupuis, 2020. "Causal mechanism of extreme river discharges in the upper Danube basin network," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(4), pages 741-764, August.
    2. David Ardia & Arnaud Dufays & Carlos Ordás Criado, 2024. "Linking Frequentist and Bayesian Change-Point Methods," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(4), pages 1155-1168, October.
    3. Fryzlewicz, Piotr & Nason, Guy P., 2006. "Haar-Fisz estimation of evolutionary wavelet spectra," LSE Research Online Documents on Economics 25227, London School of Economics and Political Science, LSE Library.
    4. Seongkyoon Jeong & Jae Young Choi, 2012. "The taxonomy of research collaboration in science and technology: evidence from mechanical research through probabilistic clustering analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(3), pages 719-735, June.
    5. Francesco Battaglia & Mattheos K. Protopapas, 2010. "Multi-regime models for nonlinear nonstationary time series," Working Papers 026, COMISEF.
    6. Boldea, Otilia & Hall, Alastair R., 2013. "Estimation and inference in unstable nonlinear least squares models," Journal of Econometrics, Elsevier, vol. 172(1), pages 158-167.
    7. Poskitt, D.S. & Sengarapillai, Arivalzahan, 2013. "Description length and dimensionality reduction in functional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 58(C), pages 98-113.
    8. Siem Jan Koopman & Soon Yip Wong, 2006. "Extracting Business Cycles using Semi-parametric Time-varying Spectra with Applications to US Macroeconomic Time Series," Tinbergen Institute Discussion Papers 06-105/4, Tinbergen Institute.
    9. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    10. Ngai Hang Chan & Chun Yip Yau & Rong-Mao Zhang, 2014. "Group LASSO for Structural Break Time Series," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 590-599, June.
    11. Chen, Zhanshou & Xu, Qiongyao & Li, Huini, 2019. "Inference for multiple change points in heavy-tailed time series via rank likelihood ratio scan statistics," Economics Letters, Elsevier, vol. 179(C), pages 53-56.
    12. Fontaine, Charles & Frostig, Ron D. & Ombao, Hernando, 2020. "Modeling non-linear spectral domain dependence using copulas with applications to rat local field potentials," Econometrics and Statistics, Elsevier, vol. 15(C), pages 85-103.
    13. Njindan Iyke, Bernard, 2015. "Macro Determinants of the Real Exchange Rate in a Small Open Small Island Economy: Evidence from Mauritius via BMA," MPRA Paper 68968, University Library of Munich, Germany.
    14. Dufays, Arnaud & Rombouts, Jeroen V.K., 2020. "Relevant parameter changes in structural break models," Journal of Econometrics, Elsevier, vol. 217(1), pages 46-78.
    15. Paul L. Bowen & Robert A. O'Farrell & Fiona H. Rohde, 2009. "An Empirical Investigation of End-User Query Development: The Effects of Improved Model Expressiveness vs. Complexity," Information Systems Research, INFORMS, vol. 20(4), pages 565-584, December.
    16. Magkonis, Georgios & Zekente, Kalliopi-Maria, 2020. "Inflation-output trade-off: Old measures, new determinants?," Journal of Macroeconomics, Elsevier, vol. 65(C).
    17. Branimir Jovanovic, 2017. "Growth forecast errors and government investment and consumption multipliers," International Review of Applied Economics, Taylor & Francis Journals, vol. 31(1), pages 83-107, January.
    18. Venkata Jandhyala & Stergios Fotopoulos & Ian MacNeill & Pengyu Liu, 2013. "Inference for single and multiple change-points in time series," Journal of Time Series Analysis, Wiley Blackwell, vol. 34(4), pages 423-446, July.
    19. Lu Shaochuan, 2023. "Scalable Bayesian Multiple Changepoint Detection via Auxiliary Uniformisation," International Statistical Review, International Statistical Institute, vol. 91(1), pages 88-113, April.
    20. Klaus Wohlrabe & Teresa Buchen, 2014. "Assessing the Macroeconomic Forecasting Performance of Boosting: Evidence for the United States, the Euro Area and Germany," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 33(4), pages 231-242, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:27:y:2012:i:4:p:715-737. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.