IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v33y2018i4d10.1007_s00180-018-0791-1.html
   My bibliography  Save this article

ClustGeo: an R package for hierarchical clustering with spatial constraints

Author

Listed:
  • Marie Chavent

    (Université de Bordeaux)

  • Vanessa Kuentz-Simonet

    (IRSTEA)

  • Amaury Labenne

    (IRSTEA)

  • Jérôme Saracco

    (ENSC - Bordeaux INP)

Abstract

In this paper, we propose a Ward-like hierarchical clustering algorithm including spatial/geographical constraints. Two dissimilarity matrices $$D_0$$ D 0 and $$D_1$$ D 1 are inputted, along with a mixing parameter $$\alpha \in [0,1]$$ α ∈ [ 0 , 1 ] . The dissimilarities can be non-Euclidean and the weights of the observations can be non-uniform. The first matrix gives the dissimilarities in the “feature space” and the second matrix gives the dissimilarities in the “constraint space”. The criterion minimized at each stage is a convex combination of the homogeneity criterion calculated with $$D_0$$ D 0 and the homogeneity criterion calculated with $$D_1$$ D 1 . The idea is then to determine a value of $$\alpha $$ α which increases the spatial contiguity without deteriorating too much the quality of the solution based on the variables of interest i.e. those of the feature space. This procedure is illustrated on a real dataset using the R package ClustGeo.

Suggested Citation

  • Marie Chavent & Vanessa Kuentz-Simonet & Amaury Labenne & Jérôme Saracco, 2018. "ClustGeo: an R package for hierarchical clustering with spatial constraints," Computational Statistics, Springer, vol. 33(4), pages 1799-1822, December.
  • Handle: RePEc:spr:compst:v:33:y:2018:i:4:d:10.1007_s00180-018-0791-1
    DOI: 10.1007/s00180-018-0791-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-018-0791-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-018-0791-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Anuška Ferligoj & Vladimir Batagelj, 1982. "Clustering with relational constraint," Psychometrika, Springer;The Psychometric Society, vol. 47(4), pages 413-426, December.
    2. Mónica Bécue-Bertaut & Belchin Kostov & Annie Morin & Guilhem Naro, 2014. "Rhetorical Strategy in Forensic Speeches: Multidimensional Statistics-Based Methodology," Journal of Classification, Springer;The Classification Society, vol. 31(1), pages 85-106, April.
    3. Gordon, A. D., 1996. "A survey of constrained classification," Computational Statistics & Data Analysis, Elsevier, vol. 21(1), pages 17-29, January.
    4. Trudie Strauss & Michael Johan von Maltitz, 2017. "Generalising Ward’s Method for Use with Manhattan Distances," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-21, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Pablo Quintana, 2022. "Una metodología de clustering para agrupar series temporales en regiones contiguas," Asociación Argentina de Economía Política: Working Papers 4589, Asociación Argentina de Economía Política.
    2. Mattera, Raffaele & Franses, Philip Hans, 2023. "Are African business cycles synchronized? Evidence from spatio-temporal modeling," Economic Modelling, Elsevier, vol. 128(C).
    3. Dalila Camêlo Aguiar & Ramón Gutiérrez Sánchez & Edwirde Luiz Silva Camêlo, 2020. "Hierarchical Clustering with Spatial Constraints and Standardized Incidence Ratio in Tuberculosis Data," Mathematics, MDPI, vol. 8(9), pages 1-12, September.
    4. Deb, Soudeep & Karmakar, Sayar, 2023. "A novel spatio-temporal clustering algorithm with applications on COVID-19 data from the United States," Computational Statistics & Data Analysis, Elsevier, vol. 188(C).
    5. Facundo Sigal & Jorge Camusso & Ana Inés Navarro, 2022. "Argentine regions based on dynamic criteria," Asociación Argentina de Economía Política: Working Papers 4600, Asociación Argentina de Economía Política.
    6. Meifang Chen & Yongwan Chun & Daniel A. Griffith, 2023. "Delineating Housing Submarkets Using Space–Time House Sales Data: Spatially Constrained Data-Driven Approaches," JRFM, MDPI, vol. 16(6), pages 1-17, June.
    7. Mello, Kaline de & Fendrich, Arthur Nicolaus & Borges-Matos, Clarice & Brites, Alice Dantas & Tavares, Paulo André & da Rocha, Gustavo Casoni & Matsumoto, Marcelo & Rodrigues, Ricardo Ribeiro & Joly, , 2021. "Integrating ecological equivalence for native vegetation compensation: A methodological approach," Land Use Policy, Elsevier, vol. 108(C).
    8. Pablo Aníbal Quintana, 2021. "Métodos de clustering espacialmente restringidos: Un análisis al agrupamiento por nivel de estudio en la provincia de Mendoza," Asociación Argentina de Economía Política: Working Papers 4510, Asociación Argentina de Economía Política.
    9. Nathanaël Randriamihamison & Nathalie Vialaneix & Pierre Neuvial, 2021. "Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 363-389, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rui Fragoso & Conceição Rego & Vladimir Bushenkov, 2016. "Clustering of Territorial Areas: A Multi-Criteria Districting Problem," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 14(2), pages 179-198, December.
    2. Juan Carlos Duque & Raúl Ramos & Jordi Suriñach, 2007. "Supervised Regionalization Methods: A Survey," International Regional Science Review, , vol. 30(3), pages 195-220, July.
    3. Juan Carlos Duque & Raúl Ramos, 2004. "Design of homogenous territorial units: a methodological proposal," ERSA conference papers ersa04p6, European Regional Science Association.
    4. Nathanaël Randriamihamison & Nathalie Vialaneix & Pierre Neuvial, 2021. "Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 363-389, July.
    5. G. Damiana Costanzo, 2001. "A constrainedk-means clustering algorithm for classifying spatial units," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 10(1), pages 237-256, January.
    6. Abang Zainoren Abang Abdurahman & Syerina Azlin Md Nasir & Wan Fairos Wan Yaacob & Serah Jaya & Suhaili Mokhtar, 2021. "Spatio-Temporal Clustering of Sarawak Malaysia Total Protected Area Visitors," Sustainability, MDPI, vol. 13(21), pages 1-19, October.
    7. Iwona Bąk & Anna Barwińska-Małajowicz & Grażyna Wolska & Paweł Walawender & Paweł Hydzik, 2021. "Is the European Union Making Progress on Energy Decarbonisation While Moving towards Sustainable Development?," Energies, MDPI, vol. 14(13), pages 1-18, June.
    8. repec:jss:jstsof:33:c02 is not listed on IDEAS
    9. Guidi, Lionel & Ibanez, Frédéric & Calcagno, Vincent & Beaugrand, Grégory, 2009. "A new procedure to optimize the selection of groups in a classification tree: Applications for ecological data," Ecological Modelling, Elsevier, vol. 220(4), pages 451-461.
    10. Juan Carlos Duque & Raúl Ramos, 2004. "Spanish unemployment: normative versus analytical regionalisation procedures," ERSA conference papers ersa04p7, European Regional Science Association.
    11. Yongcui Lan & Jinliang Wang & Wenying Hu & Eldar Kurbanov & Janine Cole & Jinming Sha & Yuanmei Jiao & Jingchun Zhou, 2023. "Spatial pattern prediction of forest wildfire susceptibility in Central Yunnan Province, China based on multivariate data," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 116(1), pages 565-586, March.
    12. Renato Coppi & Pierpaolo D’Urso & Paolo Giordani, 2010. "A Fuzzy Clustering Model for Multivariate Spatial Time Series," Journal of Classification, Springer;The Classification Society, vol. 27(1), pages 54-88, March.
    13. Laurin Arnold & Jan Jöhnk & Florian Vogt & Nils Urbach, 2022. "IIoT platforms’ architectural features – a taxonomy and five prevalent archetypes," Electronic Markets, Springer;IIM University of St. Gallen, vol. 32(2), pages 927-944, June.
    14. D’Urso, Pierpaolo & Manca, Germana & Waters, Nigel & Girone, Stefania, 2019. "Visualizing regional clusters of Sardinia's EU supported agriculture: A Spatial Fuzzy Partitioning Around Medoids," Land Use Policy, Elsevier, vol. 83(C), pages 571-580.
    15. Andrzej Młodak, 2021. "k-Means, Ward and Probabilistic Distance-Based Clustering Methods with Contiguity Constraint," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 313-352, July.
    16. Juan C. Duque & Xinyue Ye & David C. Folch, 2015. "spMorph: An exploratory space-time analysis tool for describing processes of spatial redistribution," Papers in Regional Science, Wiley Blackwell, vol. 94(3), pages 629-651, August.
    17. Maria da Conceição Rego & Rui Fragoso & Vladimir Bushenkov, 2014. "Clustering of Territorial Areas: A Multi-Criteria Districting Problem," ERSA conference papers ersa14p218, European Regional Science Association.
    18. Dongyoung Kim & Sungwon Jung & Yongwook Jeong, 2021. "Theft Prediction Model Based on Spatial Clustering to Reflect Spatial Characteristics of Adjacent Lands," Sustainability, MDPI, vol. 13(14), pages 1-14, July.
    19. Recchia, Anthony, 2010. "Contiguity-Constrained Hierarchical Agglomerative Clustering Using SAS," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(c02).
    20. Dalila Camêlo Aguiar & Ramón Gutiérrez Sánchez & Edwirde Luiz Silva Camêlo, 2020. "Hierarchical Clustering with Spatial Constraints and Standardized Incidence Ratio in Tuberculosis Data," Mathematics, MDPI, vol. 8(9), pages 1-12, September.
    21. Giuseppe Giordano & Maria Vitale, 2011. "On the use of external information in social network analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(2), pages 95-112, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:33:y:2018:i:4:d:10.1007_s00180-018-0791-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.