IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v34y2019i1d10.1007_s00180-018-0857-0.html
   My bibliography  Save this article

Assessing variable importance in clustering: a new method based on unsupervised binary decision trees

Author

Listed:
  • Ghattas Badih

    (Aix Marseille Université, CNRS, Centrale Marseille)

  • Michel Pierre

    (Aix Marseille Université, CNRS, Centrale Marseille
    Aix Marseille Université)

  • Boyer Laurent

    (Aix Marseille Université)

Abstract

We consider different approaches for assessing variable importance in clustering. We focus on clustering using binary decision trees (CUBT), which is a non-parametric top-down hierarchical clustering method designed for both continuous and nominal data. We suggest a measure of variable importance for this method similar to the one used in Breiman’s classification and regression trees. This score is useful to rank the variables in a dataset, to determine which variables are the most important or to detect the irrelevant ones. We analyze both stability and efficiency of this score on different data simulation models in the presence of noise, and compare it to other classical variable importance measures. Our experiments show that variable importance based on CUBT is much more efficient than other approaches in a large variety of situations.

Suggested Citation

  • Ghattas Badih & Michel Pierre & Boyer Laurent, 2019. "Assessing variable importance in clustering: a new method based on unsupervised binary decision trees," Computational Statistics, Springer, vol. 34(1), pages 301-321, March.
  • Handle: RePEc:spr:compst:v:34:y:2019:i:1:d:10.1007_s00180-018-0857-0
    DOI: 10.1007/s00180-018-0857-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-018-0857-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-018-0857-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. R. Darrell Bock, 1972. "Estimating item parameters and latent ability when responses are scored in two or more nominal categories," Psychometrika, Springer;The Psychometric Society, vol. 37(1), pages 29-51, March.
    2. Ricardo Fraiman & Badih Ghattas & Marcela Svarc, 2013. "Interpretable clustering using unsupervised binary trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(2), pages 125-145, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Luo, Nanyu & Ji, Feng & Han, Yuting & He, Jinbo & Zhang, Xiaoya, 2024. "Fitting item response theory models using deep learning computational frameworks," OSF Preprints tjxab, Center for Open Science.
    2. Michelle M. LaMar, 2018. "Markov Decision Process Measurement Model," Psychometrika, Springer;The Psychometric Society, vol. 83(1), pages 67-88, March.
    3. Bas Hemker & Klaas Sijtsma & Ivo Molenaar & Brian Junker, 1996. "Polytomous IRT models and monotone likelihood ratio of the total score," Psychometrika, Springer;The Psychometric Society, vol. 61(4), pages 679-693, December.
    4. Sijia Huang & Li Cai, 2024. "Cross-Classified Item Response Theory Modeling With an Application to Student Evaluation of Teaching," Journal of Educational and Behavioral Statistics, , vol. 49(3), pages 311-341, June.
    5. Björn Andersson & Tao Xin, 2021. "Estimation of Latent Regression Item Response Theory Models Using a Second-Order Laplace Approximation," Journal of Educational and Behavioral Statistics, , vol. 46(2), pages 244-265, April.
    6. Golovkine, Steven & Klutchnikoff, Nicolas & Patilea, Valentin, 2022. "Clustering multivariate functional data using unsupervised binary trees," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    7. Jouni Kuha & Myrsini Katsikatsou & Irini Moustaki, 2018. "Latent variable modelling with non‐ignorable item non‐response: multigroup response propensity models for cross‐national analysis," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1169-1192, October.
    8. Laine Bradshaw & Jonathan Templin, 2014. "Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions," Psychometrika, Springer;The Psychometric Society, vol. 79(3), pages 403-425, July.
    9. Javier Revuelta, 2004. "Analysis of distractor difficulty in multiple-choice items," Psychometrika, Springer;The Psychometric Society, vol. 69(2), pages 217-234, June.
    10. Ulf Böckenholt, 2012. "The Cognitive-Miser Response Model: Testing for Intuitive and Deliberate Reasoning," Psychometrika, Springer;The Psychometric Society, vol. 77(2), pages 388-399, April.
    11. Albert Yu & Jeffrey A. Douglas, 2023. "IRT Models for Learning With Item-Specific Learning Parameters," Journal of Educational and Behavioral Statistics, , vol. 48(6), pages 866-888, December.
    12. John Hsu & Tom Leonard & Kam-Wah Tsui, 1991. "Statistical inference for multiple choice tests," Psychometrika, Springer;The Psychometric Society, vol. 56(2), pages 327-348, June.
    13. repec:hal:journl:hal-03533356 is not listed on IDEAS
    14. Zachary F. Fisher & Kenneth A. Bollen, 2020. "An Instrumental Variable Estimator for Mixed Indicators: Analytic Derivatives and Alternative Parameterizations," Psychometrika, Springer;The Psychometric Society, vol. 85(3), pages 660-683, September.
    15. Gunter Maris & Han Maas, 2012. "Speed-Accuracy Response Models: Scoring Rules based on Response Time and Accuracy," Psychometrika, Springer;The Psychometric Society, vol. 77(4), pages 615-633, October.
    16. Irini Moustaki & Martin Knott, 2000. "Generalized latent trait models," Psychometrika, Springer;The Psychometric Society, vol. 65(3), pages 391-411, September.
    17. Yingbin Zhang & Zhaoxi Yang & Yehui Wang, 2022. "The Impact of Extreme Response Style on the Mean Comparison of Two Independent Samples," SAGE Open, , vol. 12(2), pages 21582440221, June.
    18. repec:jss:jstsof:35:i12 is not listed on IDEAS
    19. Dylan Molenaar, 2015. "Heteroscedastic Latent Trait Models for Dichotomous Data," Psychometrika, Springer;The Psychometric Society, vol. 80(3), pages 625-644, September.
    20. Yang Liu & Weimeng Wang, 2022. "Semiparametric Factor Analysis for Item-Level Response Time Data," Psychometrika, Springer;The Psychometric Society, vol. 87(2), pages 666-692, June.
    21. Antonio Rodríguez Andrés & Voxi Heinrich S. Amavilah & Abraham Otero, 2021. "Evaluation of technology clubs by clustering: a cautionary note," Applied Economics, Taylor & Francis Journals, vol. 53(52), pages 5989-6001, November.
    22. David Magis, 2015. "A Note on the Equivalence Between Observed and Expected Information Functions With Polytomous IRT Models," Journal of Educational and Behavioral Statistics, , vol. 40(1), pages 96-105, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:34:y:2019:i:1:d:10.1007_s00180-018-0857-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.