IDEAS home Printed from https://ideas.repec.org/p/bge/wpaper/555.html
   My bibliography  Save this paper

A Simple Permutation Test for Clusteredness

Author

Listed:
  • Michael Greenacre

Abstract

Hierarchical clustering is a popular method for finding structure in multivariate data, resulting in a binary tree constructed on the particular objects of the study, usually sampling units. The user faces the decision where to cut the binary tree in order to determine the number of clusters to interpret and there are various ad hoc rules for arriving at a decision. A simple permutation test is presented that diagnoses whether non-random levels of clustering are present in the set of objects and, if so, indicates the specific level at which the tree can be cut. The test is validated against random matrices to verify the type I error probability and a power study is performed on data sets with known clusteredness to study the type II error.

Suggested Citation

  • Michael Greenacre, 2015. "A Simple Permutation Test for Clusteredness," Working Papers 555, Barcelona School of Economics.
  • Handle: RePEc:bge:wpaper:555
    as

    Download full text from publisher

    File URL: https://bw.bse.eu/wp-content/uploads/2015/09/555-file.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Gordon, A. D., 1994. "Identifying genuine clusters in a classification," Computational Statistics & Data Analysis, Elsevier, vol. 18(5), pages 561-581, December.
    2. Michael Greenacre, 2008. "Correspondence analysis of raw data," Economics Working Papers 1112, Department of Economics and Business, Universitat Pompeu Fabra, revised Jul 2009.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Christian Haedo & Michel Mouchart, 2022. "Two-mode clustering through profiles of regions and sectors," Empirical Economics, Springer, vol. 63(4), pages 1971-1996, October.
    2. Lucie Aulus-Giacosa & Sébastien Ollier & Cleo Bertelsmeier, 2024. "Non-native ants are breaking down biogeographic boundaries and homogenizing community assemblages," Nature Communications, Nature, vol. 15(1), pages 1-11, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jacqueline Meulman, 1996. "Fitting a distance model to homogeneous subsets of variables: Points of view analysis of categorical data," Journal of Classification, Springer;The Classification Society, vol. 13(2), pages 249-266, September.
    2. Hayo, Bernd & Seifert, Wolfgang, 2003. "Subjective economic well-being in Eastern Europe," Journal of Economic Psychology, Elsevier, vol. 24(3), pages 329-348, June.
    3. Eric Beh & Luigi D’Ambra, 2009. "Some Interpretative Tools for Non-Symmetrical Correspondence Analysis," Journal of Classification, Springer;The Classification Society, vol. 26(1), pages 55-76, April.
    4. Mulquin, Marie-Eve & Siaens, Corinne & Wodon, Quentin, 1998. "Les restaurants du coeur : pour qui et pourquoi ? [Food Aid for the Poor or Social Support? Case Study on a Belgian Social Restaurant]," MPRA Paper 10504, University Library of Munich, Germany.
    5. Pilar García Gómez & Ángel López Nicolás, 2005. "Socio-economic inequalities in health in Catalonia," Hacienda Pública Española / Review of Public Economics, IEF, vol. 175(4), pages 103-121, december.
    6. Rosaria Lombardo & Jacqueline Meulman, 2010. "Multiple Correspondence Analysis via Polynomial Transformations of Ordered Categorical Variables," Journal of Classification, Springer;The Classification Society, vol. 27(2), pages 191-210, September.
    7. David Bholat & Stephen Hans & Pedro Santos & Cheryl Schonhardt-Bailey, 2015. "Text mining for central banks," Handbooks, Centre for Central Banking Studies, Bank of England, number 33, April.
    8. Michael Greenacre, 2012. "Fuzzy coding in constrained ordinations," Economics Working Papers 1325, Department of Economics and Business, Universitat Pompeu Fabra.
    9. Shizuhiko Nishisato, 1996. "Reviews," Psychometrika, Springer;The Psychometric Society, vol. 61(2), pages 391-393, June.
    10. Harvey Goldstein, 1987. "The choice of constraints in correspondence analysis," Psychometrika, Springer;The Psychometric Society, vol. 52(2), pages 207-215, June.
    11. Michael J. Greenacre & Patrick J. F. Groenen, 2016. "Weighted Euclidean Biplots," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 442-459, October.
    12. Rémi Bazillier & Nicolas Sirven, 2006. "Les normes fondamentales du travail contribuent-elles à réduire les inégalités ?," Revue Française d'Économie, Programme National Persée, vol. 21(2), pages 111-146.
    13. Jason Owen-Smith & Massimo Riccaboni & Fabio Pammolli & Walter W. Powell, 2002. "A Comparison of U.S. and European University-Industry Relations in the Life Sciences," Management Science, INFORMS, vol. 48(1), pages 24-43, January.
    14. Alfonso Gambardella & Walter Garcia Fontes, 1996. "European research funding and regional technological capabilities: Network composition analysis," Economics Working Papers 174, Department of Economics and Business, Universitat Pompeu Fabra.
    15. Antoine Falguerolles & Said Jmel & Joe Whittaker, 1995. "Correspondence analysis and association models constrained by a conditional independence graph," Psychometrika, Springer;The Psychometric Society, vol. 60(2), pages 161-180, June.
    16. Paul Green & Jonathan Kim & Frank Carmone, 1990. "A preliminary study of optimal variable weighting in k-means clustering," Journal of Classification, Springer;The Classification Society, vol. 7(2), pages 271-285, September.
    17. Ruben Konig, 2010. "Changing social categories in a changing society: studying trends with correspondence analysis," Quality & Quantity: International Journal of Methodology, Springer, vol. 44(3), pages 409-425, April.
    18. Michael Greenacre & Shizuhiko Nishisato, 1996. "Reviews," Psychometrika, Springer;The Psychometric Society, vol. 61(1), pages 177-190, March.
    19. John Lennon & Michael J. Keane, 2006. "Delineating Daily Activity Spaces in Rural Areas," Working Papers 0617, Rural Economy and Development Programme,Teagasc.
    20. Peter Heijden & Jan Leeuw, 1985. "Correspondence analysis used complementary to loglinear analysis," Psychometrika, Springer;The Psychometric Society, vol. 50(4), pages 429-447, December.

    More about this item

    Keywords

    Distance; Hierarchical clustering; permutation test;
    All these keywords.

    JEL classification:

    • C19 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Other
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bge:wpaper:555. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Bruno Guallar (email available below). General contact details of provider: https://edirc.repec.org/data/bargses.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.