IDEAS home Printed from https://ideas.repec.org/a/eee/ecomod/v220y2009i4p451-461.html
   My bibliography  Save this article

A new procedure to optimize the selection of groups in a classification tree: Applications for ecological data

Author

Listed:
  • Guidi, Lionel
  • Ibanez, Frédéric
  • Calcagno, Vincent
  • Beaugrand, Grégory

Abstract

Agglomerative cluster analyses encompass many techniques, which have been widely used in various fields of science. In biology, and specifically ecology, datasets are generally highly variable and may contain outliers, which increase the difficulty to identify the number of clusters. Here we present a new criterion to determine statistically the optimal level of partition in a classification tree. The criterion robustness is tested against perturbated data (outliers) using an observation or variable with values randomly generated. The technique, called Random Simulation Test (RST), is tested on (1) the well-known Iris dataset [Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Ann. Eugenic. 7, 179–188], (2) simulated data with predetermined numbers of clusters following Milligan and Cooper [Milligan, G.W., Cooper, M.C., 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179] and finally (3) is applied on real copepod communities data previously analyzed in Beaugrand et al. [Beaugrand, G., Ibanez, F., Lindley, J.A., Reid, P.C., 2002. Diversity of calanoid copepods in the North Atlantic and adjacent seas: species associations and biogeography. Mar. Ecol. Prog. Ser. 232, 179–195]. The technique is compared to several standard techniques. RST performed generally better than existing algorithms on simulated data and proved to be especially efficient with highly variable datasets.

Suggested Citation

  • Guidi, Lionel & Ibanez, Frédéric & Calcagno, Vincent & Beaugrand, Grégory, 2009. "A new procedure to optimize the selection of groups in a classification tree: Applications for ecological data," Ecological Modelling, Elsevier, vol. 220(4), pages 451-461.
  • Handle: RePEc:eee:ecomod:v:220:y:2009:i:4:p:451-461
    DOI: 10.1016/j.ecolmodel.2008.11.006
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0304380008005437
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ecolmodel.2008.11.006?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gordon, A. D., 1996. "A survey of constrained classification," Computational Statistics & Data Analysis, Elsevier, vol. 21(1), pages 17-29, January.
    2. Glenn Milligan & Martha Cooper, 1985. "An examination of procedures for determining the number of clusters in a data set," Psychometrika, Springer;The Psychometric Society, vol. 50(2), pages 159-179, June.
    3. Bertrand, P. & Bel Mufti, G., 2006. "Loevinger's measures of rule quality for assessing cluster stability," Computational Statistics & Data Analysis, Elsevier, vol. 50(4), pages 992-1015, February.
    4. Glenn Milligan, 1981. "A monte carlo study of thirty internal criterion measures for cluster analysis," Psychometrika, Springer;The Psychometric Society, vol. 46(2), pages 187-199, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Henner Gimpel & Daniel Rau & Maximilian Röglinger, 2018. "Understanding FinTech start-ups – a taxonomy of consumer-oriented service offerings," Electronic Markets, Springer;IIM University of St. Gallen, vol. 28(3), pages 245-264, August.
    2. Massimiliano Agovino & Maria Ferrara & Antonio Garofalo, 2017. "The driving factors of separate waste collection in Italy: a multidimensional analysis at provincial level," Environment, Development and Sustainability: A Multidisciplinary Approach to the Theory and Practice of Sustainable Development, Springer, vol. 19(6), pages 2297-2316, December.
    3. Weinand, J.M. & McKenna, R. & Fichtner, W., 2019. "Developing a municipality typology for modelling decentralised energy systems," Utilities Policy, Elsevier, vol. 57(C), pages 75-96.
    4. Ertl, Antal & Horn, Dániel & Kiss, Hubert János, 2024. "Economic Preferences across Generations and Family Clusters: A Comment," I4R Discussion Paper Series 105, The Institute for Replication (I4R).
    5. Bauer, Hans H. & Fischer, Marc, 2000. "Product life cycle patterns for pharmaceuticals and their impact on R&D profitability of late mover products," International Business Review, Elsevier, vol. 9(6), pages 703-725, December.
    6. Dario Bruzzese & Domenico Vistocco, 2015. "DESPOTA: DEndrogram Slicing through a PemutatiOn Test Approach," Journal of Classification, Springer;The Classification Society, vol. 32(2), pages 285-304, July.
    7. G. Damiana Costanzo, 2001. "A constrainedk-means clustering algorithm for classifying spatial units," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 10(1), pages 237-256, January.
    8. Mark Chiang & Boris Mirkin, 2010. "Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads," Journal of Classification, Springer;The Classification Society, vol. 27(1), pages 3-40, March.
    9. Liu, Pei-chen Barry & Hansen, Mark & Mukherjee, Avijit, 2008. "Scenario-based air traffic flow management: From theory to practice," Transportation Research Part B: Methodological, Elsevier, vol. 42(7-8), pages 685-702, August.
    10. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    11. Alessandra Cepparulo & Antonello Zanfei, 2019. "The diffusion of public eServices in European cities," Working Papers 1904, University of Urbino Carlo Bo, Department of Economics, Society & Politics - Scientific Committee - L. Stefanini & G. Travaglini, revised 2019.
    12. Noelia Caceres & Luis M. Romero & Francisco J. Morales & Antonio Reyes & Francisco G. Benitez, 2018. "Estimating traffic volumes on intercity road locations using roadway attributes, socioeconomic features and other work-related activity characteristics," Transportation, Springer, vol. 45(5), pages 1449-1473, September.
    13. Michele Cincera, 2005. "Firms' productivity growth and R&D spillovers: An analysis of alternative technological proximity measures," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 14(8), pages 657-682.
    14. Douglas L. Steinley & M. J. Brusco, 2019. "Using an Iterative Reallocation Partitioning Algorithm to Verify Test Multidimensionality," Journal of Classification, Springer;The Classification Society, vol. 36(3), pages 397-413, October.
    15. Javier Sevil-Serrano & Alberto Aibar-Solana & Ángel Abós & José Antonio Julián & Luis García-González, 2019. "Healthy or Unhealthy? The Cocktail of Health-Related Behavior Profiles in Spanish Adolescents," IJERPH, MDPI, vol. 16(17), pages 1-14, August.
    16. Jacques-Antoine Gauthier & Eric D. Widmer & Philipp Bucher & Cédric Notredame, 2009. "How Much Does It Cost?," Sociological Methods & Research, , vol. 38(1), pages 197-231, August.
    17. Jack DeWaard & Keuntae Kim & James Raymer, 2012. "Migration Systems in Europe: Evidence From Harmonized Flow Data," Demography, Springer;Population Association of America (PAA), vol. 49(4), pages 1307-1333, November.
    18. Rui Fragoso & Conceição Rego & Vladimir Bushenkov, 2016. "Clustering of Territorial Areas: A Multi-Criteria Districting Problem," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 14(2), pages 179-198, December.
    19. Vicente Rodríguez Montequín & Joaquín Villanueva Balsera & Sonia María Cousillas Fernández & Francisco Ortega Fernández, 2018. "Exploring Project Complexity through Project Failure Factors: Analysis of Cluster Patterns Using Self-Organizing Maps," Complexity, Hindawi, vol. 2018, pages 1-17, May.
    20. Goethner, Maximilian & Hornuf, Lars & Regner, Tobias, 2021. "Protecting investors in equity crowdfunding: An empirical analysis of the small investor protection act," Technological Forecasting and Social Change, Elsevier, vol. 162(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecomod:v:220:y:2009:i:4:p:451-461. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/ecological-modelling .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.