IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v15y2021i2d10.1007_s11634-020-00411-w.html
   My bibliography  Save this article

Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data

Author

Listed:
  • Kadri Umbleja

    (Tokyo Denki University)

  • Manabu Ichino

    (Tokyo Denki University)

  • Hiroyuki Yaguchi

    (Tokyo Denki University)

Abstract

Symbolic data is aggregated from bigger traditional datasets in order to hide entry specific details and to enable analysing large amounts of data, like big data, which would otherwise not be possible. Symbolic data may appear in many different but complex forms like intervals and histograms. Identifying patterns and finding similarities between objects is one of the most fundamental tasks of data mining. In order to accurately cluster these sophisticated data types, usual methods are not enough. Throughout the years different approaches have been proposed but they mainly concentrate on the “macroscopic” similarities between objects. Distributional data, for example symbolic data, has been aggregated from sets of large data and thus even the smallest microscopic differences and similarities become extremely important. In this paper a method is proposed for clustering distributional data based on these microscopic similarities by using quantile values. Having multiple points for comparison enables to identify similarities in small sections of distribution while producing more adequate hierarchical concepts. Proposed algorithm, called microscopic hierarchical conceptual clustering, has a monotone property and has been found to produce more adequate conceptual clusters during experimentation. Furthermore, thanks to the usage of quantiles, this algorithm allows us to compare different types of symbolic data easily without any additional complexity.

Suggested Citation

  • Kadri Umbleja & Manabu Ichino & Hiroyuki Yaguchi, 2021. "Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(2), pages 407-436, June.
  • Handle: RePEc:spr:advdac:v:15:y:2021:i:2:d:10.1007_s11634-020-00411-w
    DOI: 10.1007/s11634-020-00411-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-020-00411-w
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-020-00411-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Stephen Johnson, 1967. "Hierarchical clustering schemes," Psychometrika, Springer;The Psychometric Society, vol. 32(3), pages 241-254, September.
    2. Lawrence Hubert, 1972. "Some extensions of Johnson's hierarchical clustering algorithms," Psychometrika, Springer;The Psychometric Society, vol. 37(3), pages 261-274, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. William Day & Herbert Edelsbrunner, 1985. "Investigation of proportional link linkage clustering methods," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 239-254, December.
    2. Claudia Quinteros-Cartaya & Guillermo Solorio-Magaña & Francisco Javier Núñez-Cornú & Felipe de Jesús Escalona-Alcázar & Diana Núñez, 2023. "Microearthquakes in the Guadalajara Metropolitan Zone, Mexico: evidence from buried active faults in Tesistán Valley, Zapopan," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 116(3), pages 2797-2818, April.
    3. Katarzyna Hampel & Paulina Ucieklak-Jez & Agnieszka Bem, 2021. "Health System Responsiveness in the Light of the Euro Health Consumer Index," European Research Studies Journal, European Research Studies Journal, vol. 0(4B), pages 659-667.
    4. Kim, Junyung & Shah, Asad Ullah Amin & Kang, Hyun Gook, 2020. "Dynamic risk assessment with bayesian network and clustering analysis," Reliability Engineering and System Safety, Elsevier, vol. 201(C).
    5. Roberts, Leigh, 2014. "Consistent estimation of breakpoints in time series, with application to wavelet analysis of Citigroup returns," Working Paper Series 18815, Victoria University of Wellington, School of Economics and Finance.
    6. David G Mets & Michael S Brainard, 2018. "An automated approach to the quantitation of vocalizations and vocal learning in the songbird," PLOS Computational Biology, Public Library of Science, vol. 14(8), pages 1-29, August.
    7. Michael Brusco & J Dennis Cradit & Douglas Steinley, 2021. "A comparison of 71 binary similarity coefficients: The effect of base rates," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-19, April.
    8. Noah E. Friedkin, 1984. "Structural Cohesion and Equivalence Explanations of Social Homogeneity," Sociological Methods & Research, , vol. 12(3), pages 235-261, February.
    9. David Matesanz Gomez & Guillermo J. Ortega & Benno Torgler, 2011. "Measuring globalization: A hierarchical network approach," CREMA Working Paper Series 2011-11, Center for Research in Economics, Management and the Arts (CREMA).
    10. Balepur, Prashant Narayan, 1998. "Impacts of Computer-Mediated Communication on Travel and Communication Patterns: The Davis Community Network Study," Institute of Transportation Studies, Research Reports, Working Papers, Proceedings qt6cb1f85c, Institute of Transportation Studies, UC Berkeley.
    11. İsmail Güzel & Atabey Kaygun, 2022. "A new non-archimedean metric on persistent homology," Computational Statistics, Springer, vol. 37(4), pages 1963-1983, September.
    12. Lisa Price, 2001. "Demystifying farmers' entomological and pest management knowledge: A methodology for assessing the impacts on knowledge from IPM-FFS and NES interventions," Agriculture and Human Values, Springer;The Agriculture, Food, & Human Values Society (AFHVS), vol. 18(2), pages 153-176, June.
    13. Elisa Frutos-Bernal & Ángel Martín del Rey & Irene Mariñas-Collado & María Teresa Santos-Martín, 2022. "An Analysis of Travel Patterns in Barcelona Metro Using Tucker3 Decomposition," Mathematics, MDPI, vol. 10(7), pages 1-17, March.
    14. Geert Soete & Wayne DeSarbo & J. Carroll, 1985. "Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 173-192, December.
    15. Silvia Blasi & Edoardo Gobbo & Silvia Rita Sedita, 2022. "Big Data for smart cities and citizen engagement: evidence from Twitter data analysis on Italian municipalities," Working Papers - Business wp2022_01.rdf, Universita' degli Studi di Firenze, Dipartimento di Scienze per l'Economia e l'Impresa.
    16. Teh, Boon Kin & Goo, Yik Wen & Lian, Tong Wei & Ong, Wei Guang & Choi, Wen Ting & Damodaran, Mridula & Cheong, Siew Ann, 2015. "The Chinese Correction of February 2007: How financial hierarchies change in a market crash," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 424(C), pages 225-241.
    17. Dalila B. M. M. Fontes & Seyed Mahdi Homayouni, 2019. "Joint production and transportation scheduling in flexible manufacturing systems," Journal of Global Optimization, Springer, vol. 74(4), pages 879-908, August.
    18. Phipps Arabie & J. Carroll, 1980. "Mapclus: A mathematical programming approach to fitting the adclus model," Psychometrika, Springer;The Psychometric Society, vol. 45(2), pages 211-235, June.
    19. Yoshio Takane & Forrest Young & Jan Leeuw, 1977. "Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features," Psychometrika, Springer;The Psychometric Society, vol. 42(1), pages 7-67, March.
    20. Fernández, D. & Arnold, R. & Pledger, S., 2016. "Mixture-based clustering for the ordered stereotype model," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 46-75.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:15:y:2021:i:2:d:10.1007_s11634-020-00411-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.