IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i15p2409-d1448698.html
   My bibliography  Save this article

Clustering Empirical Bootstrap Distribution Functions Parametrized by Galton–Watson Branching Processes

Author

Listed:
  • Lauri Varmann

    (Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal)

  • Helena Mouriño

    (Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal
    Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal)

Abstract

The nonparametric bootstrap has been used in cluster analysis for various purposes. One of those purposes is to account for sampling variability. This can be achieved by obtaining a bootstrap approximation of the sampling distribution function of the estimator of interest and then clustering those distribution functions. Although the consistency of the nonparametric bootstrap in estimating transformations of the sample mean has been known for decades, little is known about how it carries over to clustering. Here, we investigated this problem with a simulation study. We considered single-linkage agglomerative hierarchical clustering and a three-type branching process for parametrized transformations of random vectors of relative frequencies of possible types of the index case of each process. In total, there were nine factors and 216 simulation scenarios in a fully-factorial design. The ability of the bootstrap-based clustering to recover the ground truth clusterings was quantified by the adjusted transfer distance between partitions. The results showed that in the best 18 scenarios, the average value of the distance was less than 20 percent of the maximum possible distance value. We noticed that the results most notably depended on the number of retained clusters, the distribution for sampling the prevalence of types, and the sample size appearing in the denominators of relative frequency types. The comparison of the bootstrap-based clustering results with so-called uninformed random partitioning results showed that in the vast majority of scenarios considered, the bootstrap-based approach led, on average, to remarkably lower classification errors than the random partitioning.

Suggested Citation

  • Lauri Varmann & Helena Mouriño, 2024. "Clustering Empirical Bootstrap Distribution Functions Parametrized by Galton–Watson Branching Processes," Mathematics, MDPI, vol. 12(15), pages 1-25, August.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:15:p:2409-:d:1448698
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/15/2409/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/15/2409/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Qidi Peng & Nan Rao & Ran Zhao, 2019. "Some Developments in Clustering Analysis on Stochastic Processes," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 9(3), pages 72-77, April.
    2. Joeri Hofmans & Eva Ceulemans & Douglas Steinley & Iven Mechelen, 2015. "On the Added Value of Bootstrap Analysis for K-Means Clustering," Journal of Classification, Springer;The Classification Society, vol. 32(2), pages 268-284, July.
    3. Irene Charon & Lucile Denoeud & Alain Guenoche & Olivier Hudry, 2006. "Maximum Transfer Distance Between Partitions," Journal of Classification, Springer;The Classification Society, vol. 23(1), pages 103-121, June.
    4. Mahmoudi, Mohammad Reza & Baleanu, Dumitru & Mansor, Zulkefli & Tuan, Bui Anh & Pho, Kim-Hung, 2020. "Fuzzy clustering method to compare the spread rate of Covid-19 in the high risks countries," Chaos, Solitons & Fractals, Elsevier, vol. 140(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hongfei Xiao & Deqin Lin & Shiyu Li, 2023. "Novel Method for Estimating Time-Varying COVID-19 Transmission Rate," Mathematics, MDPI, vol. 11(10), pages 1-18, May.
    2. Aurora Torrente & Juan Romo, 2021. "Initializing k-means Clustering by Bootstrap and Data Depth," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 232-256, July.
    3. José E. Chacón & Ana I. Rastrojo, 2023. "Minimum adjusted Rand index for two clusterings of a given size," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 125-133, March.
    4. Lucile Denœud, 2008. "Transfer distance between partitions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 2(3), pages 279-294, December.
    5. Abboubakar, Hamadjam & Kombou, Lausaire Kemayou & Koko, Adamou Dang & Fouda, Henri Paul Ekobena & Kumar, Anoop, 2021. "Projections and fractional dynamics of the typhoid fever: A case study of Mbandjock in the Centre Region of Cameroon," Chaos, Solitons & Fractals, Elsevier, vol. 150(C).
    6. Yu, Zhenhua & Arif, Robia & Fahmy, Mohamed Abdelsabour & Sohail, Ayesha, 2021. "Self organizing maps for the parametric analysis of COVID-19 SEIRS delayed model," Chaos, Solitons & Fractals, Elsevier, vol. 150(C).
    7. Víctor Blanco & Ricardo Gázquez & Marina Leal, 2023. "Mathematical optimization models for reallocating and sharing health equipment in pandemic situations," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(2), pages 355-390, July.
    8. Hussain, Takasar & Aslam, Adnan & Ozair, Muhammad & Tasneem, Fatima & Gómez-Aguilar, J.F., 2021. "Dynamical aspects of pine wilt disease and control measures," Chaos, Solitons & Fractals, Elsevier, vol. 145(C).
    9. Daniel Aloise & Nielsen Castelo Damasceno & Nenad Mladenović & Daniel Nobre Pinheiro, 2017. "On Strategies to Fix Degenerate k-means Solutions," Journal of Classification, Springer;The Classification Society, vol. 34(2), pages 165-190, July.
    10. Amouch, Mohamed & Karim, Noureddine, 2021. "Modeling the dynamic of COVID-19 with different types of transmissions," Chaos, Solitons & Fractals, Elsevier, vol. 150(C).
    11. Huilong Wang & Meimei Wang & Rong Yang & Huijuan Yang, 2023. "Urban Resilience of Important Node Cities in Population Migration under the Influence of COVID-19 Based on Mamdani Fuzzy Inference System," Sustainability, MDPI, vol. 15(19), pages 1-22, September.
    12. Alain Guénoche, 2011. "Consensus of partitions : a constructive approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(3), pages 215-229, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:15:p:2409-:d:1448698. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.