IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v55y2011i7p2250-2262.html
   My bibliography  Save this article

A polythetic clustering process and cluster validity indexes for histogram-valued objects

Author

Listed:
  • Kim, Jaejik
  • Billard, L.

Abstract

Clustering is an explanatory procedure which helps to understand data with complex structure and multivariate relationships, and is a very useful method to extract knowledge and information especially from large datasets. When such datasets are aggregated into categories (as driven by scientific questions underlying the analysis), the resulting observations will perforce be expressed as so-called symbolic data (though symbolic data can occur "naturally" in any sized datasets). The focus of this work is to provide a divisive polythetic algorithm to establish clusters for p-dimensional histogram-valued data. In addition, two cluster validity indexes for use in establishing the optimal number of clusters are also developed. Finally, the proposed procedure is applied to a large forestry cover type dataset.

Suggested Citation

  • Kim, Jaejik & Billard, L., 2011. "A polythetic clustering process and cluster validity indexes for histogram-valued objects," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2250-2262, July.
  • Handle: RePEc:eee:csdana:v:55:y:2011:i:7:p:2250-2262
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(11)00027-2
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Glenn Milligan & Martha Cooper, 1985. "An examination of procedures for determining the number of clusters in a data set," Psychometrika, Springer;The Psychometric Society, vol. 50(2), pages 159-179, June.
    2. Chavent, Marie & Lechevallier, Yves & Briant, Olivier, 2007. "DIVCLUS-T: A monothetic divisive hierarchical clustering method," Computational Statistics & Data Analysis, Elsevier, vol. 52(2), pages 687-701, October.
    3. Struyf, Anja & Hubert, Mia & Rousseeuw, Peter, 1997. "Clustering in an Object-Oriented Environment," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 1(i04).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kim, Jaejik & Billard, L., 2012. "Dissimilarity measures and divisive clustering for symbolic multimodal-valued data," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2795-2808.
    2. Francisco de A. T. Carvalho & Antonio Irpino & Rosanna Verde & Antonio Balzanella, 2022. "Batch Self-Organizing Maps for Distributional Data with an Automatic Weighting of Variables and Components," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 343-375, July.
    3. Soroosh Shalileh, 2023. "An Effective Partitional Crisp Clustering Method Using Gradient Descent Approach," Mathematics, MDPI, vol. 11(12), pages 1-23, June.
    4. Nataša Kejžar & Simona Korenjak-Černe & Vladimir Batagelj, 2021. "Clustering of modal-valued symbolic data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(2), pages 513-541, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jörg Weking & Michael Mandalenakis & Andreas Hein & Sebastian Hermes & Markus Böhm & Helmut Krcmar, 2020. "The impact of blockchain technology on business models – a taxonomy and archetypal patterns," Electronic Markets, Springer;IIM University of St. Gallen, vol. 30(2), pages 285-305, June.
    2. Liu, Pei-chen Barry & Hansen, Mark & Mukherjee, Avijit, 2008. "Scenario-based air traffic flow management: From theory to practice," Transportation Research Part B: Methodological, Elsevier, vol. 42(7-8), pages 685-702, August.
    3. Hélène Syed Zwick & S. Ali Shah Syed, 2017. "The polarization impact of the crisis on the Eurozone labour markets: a hierarchical cluster analysis," Applied Economics Letters, Taylor & Francis Journals, vol. 24(7), pages 472-476, April.
    4. Wu, Han-Ming & Tien, Yin-Jing & Chen, Chun-houh, 2010. "GAP: A graphical environment for matrix visualization and cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 767-778, March.
    5. Goethner, Maximilian & Hornuf, Lars & Regner, Tobias, 2021. "Protecting investors in equity crowdfunding: An empirical analysis of the small investor protection act," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    6. Pennings, J.S.J. & van Kranenburg, H.L. & Hagedoorn, J., 2005. "Past, present and future of the telecommunications industry," Research Memorandum 016, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
    7. Li-Xuan Qin & Steven G. Self, 2006. "The Clustering of Regression Models Method with Applications in Gene Expression Data," Biometrics, The International Biometric Society, vol. 62(2), pages 526-533, June.
    8. Caroline Méjean & Pauline Macouillard & Sandrine Péneau & Camille Lassale & Serge Hercberg & Katia Castetbon, 2014. "Association of Perception of Front-of-Pack Labels with Dietary, Lifestyle and Health Characteristics," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-11, March.
    9. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    10. Luis García-González & Ángel Abós & Sergio Diloy-Peña & Alexander Gil-Arias & Javier Sevil-Serrano, 2020. "Can a Hybrid Sport Education/Teaching Games for Understanding Volleyball Unit Be More Effective in Less Motivated Students? An Examination into a Set of Motivation-Related Variables," Sustainability, MDPI, vol. 12(15), pages 1-16, July.
    11. Hyeri Choi & Min Jae Park, 2019. "Evaluating the Efficiency of Governmental Excellence for Social Progress: Focusing on Low- and Lower-Middle-Income Countries," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 141(1), pages 111-130, January.
    12. Alessandra Cepparulo & Antonello Zanfei, 2019. "The diffusion of public eServices in European cities," Working Papers 1904, University of Urbino Carlo Bo, Department of Economics, Society & Politics - Scientific Committee - L. Stefanini & G. Travaglini, revised 2019.
    13. Ana Helena Tavares & Jakob Raymaekers & Peter J. Rousseeuw & Paula Brito & Vera Afreixo, 2020. "Clustering genomic words in human DNA using peaks and trends of distributions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 57-76, March.
    14. Noelia Caceres & Luis M. Romero & Francisco J. Morales & Antonio Reyes & Francisco G. Benitez, 2018. "Estimating traffic volumes on intercity road locations using roadway attributes, socioeconomic features and other work-related activity characteristics," Transportation, Springer, vol. 45(5), pages 1449-1473, September.
    15. Michele Cincera, 2005. "Firms' productivity growth and R&D spillovers: An analysis of alternative technological proximity measures," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 14(8), pages 657-682.
    16. André Lucas & Julia Schaumburg & Bernd Schwaab, 2019. "Bank Business Models at Zero Interest Rates," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 37(3), pages 542-555, July.
    17. Wang, Ketong & Porter, Michael D., 2018. "Optimal Bayesian clustering using non-negative matrix factorization," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 395-411.
    18. Kauffmann, Albrecht, 2012. "Delineation of City Regions Based on Commuting Interrelations: The Example of Large Cities in Germany," IWH Discussion Papers 4/2012, Halle Institute for Economic Research (IWH).
    19. Douglas L. Steinley & M. J. Brusco, 2019. "Using an Iterative Reallocation Partitioning Algorithm to Verify Test Multidimensionality," Journal of Classification, Springer;The Classification Society, vol. 36(3), pages 397-413, October.
    20. Lin Chang-Ching & Ng Serena, 2012. "Estimation of Panel Data Models with Parameter Heterogeneity when Group Membership is Unknown," Journal of Econometric Methods, De Gruyter, vol. 1(1), pages 42-55, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:55:y:2011:i:7:p:2250-2262. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.