IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v39y2022i3d10.1007_s00357-022-09413-z.html
   My bibliography  Save this article

Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs

Author

Listed:
  • Matthijs J. Warrens

    (University of Groningen)

  • Hanneke Hoef

    (University of Groningen)

Abstract

In unsupervised machine learning, agreement between partitions is commonly assessed with so-called external validity indices. Researchers tend to use and report indices that quantify agreement between two partitions for all clusters simultaneously. Commonly used examples are the Rand index and the adjusted Rand index. Since these overall measures give a general notion of what is going on, their values are usually hard to interpret. The goal of this study is to provide a thorough understanding of the adjusted Rand index as well as many other partition comparison indices based on counting object pairs. It is shown that many overall indices based on the pair-counting approach can be decomposed into indices that reflect the degree of agreement on the level of individual clusters. The decompositions (1) show that the overall indices can be interpreted as summary statistics of the agreement on the cluster level, (2) specify how these overall indices are related to the indices for individual clusters, and (3) show that the overall indices are affected by cluster size imbalance: if cluster sizes are unbalanced these overall measures will primarily reflect the degree of agreement between the partitions on the large clusters, and will provide much less information on the agreement on smaller clusters. Furthermore, the value of Rand-like indices is determined to a large extent by the number of pairs of objects that are not joined in either of the partitions.

Suggested Citation

  • Matthijs J. Warrens & Hanneke Hoef, 2022. "Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 487-509, November.
  • Handle: RePEc:spr:jclass:v:39:y:2022:i:3:d:10.1007_s00357-022-09413-z
    DOI: 10.1007/s00357-022-09413-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-022-09413-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-022-09413-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zhiguang Huo & Ying Ding & Silvia Liu & Steffi Oesterreich & George Tseng, 2016. "Meta-Analytic Framework for Sparse K -Means to Identify Disease Subtypes in Multiple Transcriptomic Studies," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 27-42, March.
    2. Stephen Johnson, 1967. "Hierarchical clustering schemes," Psychometrika, Springer;The Psychometric Society, vol. 32(3), pages 241-254, September.
    3. Douglas Steinley & Gretchen Hendrickson & Michael Brusco, 2015. "A Note on Maximizing the Agreement Between Partitions: A Stepwise Optimal Algorithm and Some Properties," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 114-126, April.
    4. Matthijs Warrens, 2008. "On Similarity Coefficients for 2×2 Tables and Correction for Chance," Psychometrika, Springer;The Psychometric Society, vol. 73(3), pages 487-502, September.
    5. Ahmed Albatineh & Magdalena Niewiadomska-Bugaj, 2011. "MCS: A Method for Finding the Number of Clusters," Journal of Classification, Springer;The Classification Society, vol. 28(2), pages 184-209, July.
    6. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    7. F. Baulieu, 1989. "A classification of presence/absence based dissimilarity coefficients," Journal of Classification, Springer;The Classification Society, vol. 6(1), pages 233-246, December.
    8. Matthijs Warrens, 2008. "Bounds of Resemblance Measures for Binary (Presence/Absence) Variables," Journal of Classification, Springer;The Classification Society, vol. 25(2), pages 195-208, November.
    9. Ahmed N. Albatineh & Magdalena Niewiadomska-Bugaj & Daniel Mihalko, 2006. "On Similarity Indices and Correction for Chance Agreement," Journal of Classification, Springer;The Classification Society, vol. 23(2), pages 301-313, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lucija v{Z}igni'c & Stjepan Beguv{s}i'c & Zvonko Kostanjv{c}ar, 2024. "Block-diagonal idiosyncratic covariance estimation in high-dimensional factor models for financial time series," Papers 2407.03781, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. José E. Chacón, 2021. "Explicit Agreement Extremes for a 2 × 2 Table with Given Marginals," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 257-263, July.
    2. Matthijs J. Warrens & Alexandra de Raadt, 2015. "Ordering Properties of the First Eigenvector of Certain Similarity Matrices," Journal of Mathematics, Hindawi, vol. 2015, pages 1-5, November.
    3. Ahmed Albatineh & Magdalena Niewiadomska-Bugaj, 2011. "Correcting Jaccard and other similarity indices for chance agreement in cluster analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(3), pages 179-200, October.
    4. Stefano Tonellato & Andrea Pastore, 2013. "On the comparison of model-based clustering solutions," Working Papers 2013:05, Department of Economics, University of Venice "Ca' Foscari".
    5. Matthijs J. Warrens, 2014. "New Interpretations of Cohen’s Kappa," Journal of Mathematics, Hindawi, vol. 2014, pages 1-9, September.
    6. Weinand, J.M. & McKenna, R. & Fichtner, W., 2019. "Developing a municipality typology for modelling decentralised energy systems," Utilities Policy, Elsevier, vol. 57(C), pages 75-96.
    7. Jeffrey L. Andrews & Ryan Browne & Chelsey D. Hvingelby, 2022. "On Assessments of Agreement Between Fuzzy Partitions," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 326-342, July.
    8. Jonathon J. O’Brien & Michael T. Lawson & Devin K. Schweppe & Bahjat F. Qaqish, 2020. "Suboptimal Comparison of Partitions," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 435-461, July.
    9. José E. Chacón & Ana I. Rastrojo, 2023. "Minimum adjusted Rand index for two clusterings of a given size," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 125-133, March.
    10. Isabella Morlini & Sergio Zani, 2012. "A New Class of Weighted Similarity Indices Using Polytomous Variables," Journal of Classification, Springer;The Classification Society, vol. 29(2), pages 199-226, July.
    11. Martina Sundqvist & Julien Chiquet & Guillem Rigaill, 2023. "Adjusting the adjusted Rand Index," Computational Statistics, Springer, vol. 38(1), pages 327-347, March.
    12. Satoru Yokoyama & Atsuho Nakayama & Akinori Okada, 2009. "One-mode three-way overlapping cluster analysis," Computational Statistics, Springer, vol. 24(1), pages 165-179, February.
    13. Zhiguang Huo & Li Zhu & Tianzhou Ma & Hongcheng Liu & Song Han & Daiqing Liao & Jinying Zhao & George Tseng, 2020. "Two-Way Horizontal and Vertical Omics Integration for Disease Subtype Discovery," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(1), pages 1-22, April.
    14. Bocci, Laura & Vicari, Donatella & Vichi, Maurizio, 2006. "A mixture model for the classification of three-way proximity data," Computational Statistics & Data Analysis, Elsevier, vol. 50(7), pages 1625-1654, April.
    15. Antonio D’Ambrosio & Sonia Amodio & Carmela Iorio & Giuseppe Pandolfo & Roberta Siciliano, 2021. "Adjusted Concordance Index: an Extensionl of the Adjusted Rand Index to Fuzzy Partitions," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 112-128, April.
    16. Isabella Morlini & Sergio Zani, 2012. "Dissimilarity and similarity measures for comparing dendrograms and their applications," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(2), pages 85-105, July.
    17. Sun Jiehuan & Warren Joshua L. & Zhao Hongyu, 2017. "A Bayesian semiparametric factor analysis model for subtype identification," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(2), pages 145-158, April.
    18. Kemmawadee Preedalikit & Daniel Fernández & Ivy Liu & Louise McMillan & Marta Nai Ruscone & Roy Costilla, 2024. "Row mixture-based clustering with covariates for ordinal responses," Computational Statistics, Springer, vol. 39(5), pages 2511-2555, July.
    19. Ekaterina Kovaleva & Boris Mirkin, 2015. "Bisecting K-Means and 1D Projection Divisive Clustering: A Unified Framework and Experimental Comparison," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 414-442, October.
    20. Andrzej Młodak, 2021. "k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 313-352, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:39:y:2022:i:3:d:10.1007_s00357-022-09413-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.