IDEAS home Printed from https://ideas.repec.org/a/spr/jglopt/v78y2020i2d10.1007_s10898-019-00871-1.html
   My bibliography  Save this article

Improved metaheuristics for the quartet method of hierarchical clustering

Author

Listed:
  • Sergio Consoli

    (Philips Research)

  • Jan Korst

    (Philips Research)

  • Steffen Pauws

    (Philips Research
    Tilburg University)

  • Gijs Geleijnse

    (Netherlands Comprehensive Cancer Organisation (IKNL))

Abstract

The quartet method is a novel hierarchical clustering approach where, given a set of n data objects and their pairwise dissimilarities, the aim is to construct an optimal tree from the total number of possible combinations of quartet topologies on n, where optimality means that the sum of the dissimilarities of the embedded (or consistent) quartet topologies is minimal. This corresponds to an NP-hard combinatorial optimization problem, also referred to as minimum quartet tree cost (MQTC) problem. We provide details and formulation of this challenging problem, and propose a basic greedy heuristic that is characterized by some appealing insights and findings for speeding up and simplifying the processes of solution generation and evaluation, such as the use of adjacency-like matrices to represent the topology structures of candidate solutions; fast calculation of coefficients and weights of the solution matrices; shortcuts in the enumeration of all solution permutations for a given configuration; and an iterative distance matrix reduction procedure, which greedily merges together highly connected objects which may bring lower values of the quartet cost function in a given partial solution. It will be shown that this basic greedy heuristic is able to improve consistently the performance of popular quartet clustering algorithms in the literature, namely a reduced variable neighbourhood search and a simulated annealing metaheuristic, producing novel efficient solution approaches to the MQTC problem.

Suggested Citation

  • Sergio Consoli & Jan Korst & Steffen Pauws & Gijs Geleijnse, 2020. "Improved metaheuristics for the quartet method of hierarchical clustering," Journal of Global Optimization, Springer, vol. 78(2), pages 241-270, October.
  • Handle: RePEc:spr:jglopt:v:78:y:2020:i:2:d:10.1007_s10898-019-00871-1
    DOI: 10.1007/s10898-019-00871-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10898-019-00871-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10898-019-00871-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. George Furnas, 1984. "The generation of random, binary unordered trees," Journal of Classification, Springer;The Classification Society, vol. 1(1), pages 187-233, December.
    2. Sergio Consoli & Jan Korst & Gijs Geleijnse & Steffen Pauws, 2019. "An exact algorithm for the minimum quartet tree cost problem," 4OR, Springer, vol. 17(4), pages 401-425, December.
    3. Michael Steel, 1992. "The complexity of reconstructing trees from qualitative characters and subtrees," Journal of Classification, Springer;The Classification Society, vol. 9(1), pages 91-116, January.
    4. Antonis Rokas & Barry L. Williams & Nicole King & Sean B. Carroll, 2003. "Genome-scale approaches to resolving incongruence in molecular phylogenies," Nature, Nature, vol. 425(6960), pages 798-804, October.
    5. Jun Pei & Zorica Dražić & Milan Dražić & Nenad Mladenović & Panos M. Pardalos, 2019. "Continuous Variable Neighborhood Search (C-VNS) for Solving Systems of Nonlinear Equations," INFORMS Journal on Computing, INFORMS, vol. 31(2), pages 235-250, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Angelo Sifaleras & Nenad Mladenović & Panos M. Pardalos, 2020. "Preface to the special issue “ICVNS 2018”," Journal of Global Optimization, Springer, vol. 78(2), pages 239-240, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sergio Consoli & Jan Korst & Gijs Geleijnse & Steffen Pauws, 2019. "An exact algorithm for the minimum quartet tree cost problem," 4OR, Springer, vol. 17(4), pages 401-425, December.
    2. Paul Bastide & Mahendra Mariadassou & Stéphane Robin, 2017. "Detection of adaptive shifts on phylogenies by using shifted stochastic processes on a tree," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1067-1093, September.
    3. Martín Espariz & Federico A Zuljan & Luis Esteban & Christian Magni, 2016. "Taxonomic Identity Resolution of Highly Phylogenetically Related Strains and Selection of Phylogenetic Markers by Using Genome-Scale Methods: The Bacillus pumilus Group Case," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-17, September.
    4. Roch, Sebastien & Steel, Mike, 2015. "Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent," Theoretical Population Biology, Elsevier, vol. 100(C), pages 56-62.
    5. DeGiorgio, Michael & Rosenberg, Noah A., 2016. "Consistency and inconsistency of consensus methods for inferring species trees from gene trees in the presence of ancestral population structure," Theoretical Population Biology, Elsevier, vol. 110(C), pages 12-24.
    6. David Peris & Emily J. Ubbelohde & Meihua Christina Kuang & Jacek Kominek & Quinn K. Langdon & Marie Adams & Justin A. Koshalek & Amanda Beth Hulfachor & Dana A. Opulente & David J. Hall & Katie Hyma , 2023. "Macroevolutionary diversity of traits and genomes in the model yeast genus Saccharomyces," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    7. François-Joseph Lapointe & Pierre Legendre, 1995. "Comparison tests for dendrograms: A comparative evaluation," Journal of Classification, Springer;The Classification Society, vol. 12(2), pages 265-282, September.
    8. Haque Md Rejuan & Kubatko Laura, 2024. "A global test of hybrid ancestry from genome-scale data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 23(1), pages 1-18, January.
    9. Bock, Hans H., 1996. "Probabilistic models in cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 23(1), pages 5-28, November.
    10. François-Joseph Lapointe & Pierre Legendre, 1991. "The generation of random ultrametric matrices representing dendrograms," Journal of Classification, Springer;The Classification Society, vol. 8(2), pages 177-200, December.
    11. Bang Ye Wu, 2004. "Constructing the Maximum Consensus Tree from Rooted Triples," Journal of Combinatorial Optimization, Springer, vol. 8(1), pages 29-39, March.
    12. Wang Yuancheng & Degnan James H, 2011. "Performance of Matrix Representation with Parsimony for Inferring Species from Gene Trees," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-39, May.
    13. Berry, V & Gascuel, O & Caraux, G, 2000. "Choosing the tree which actually best explains the data: another look at the bootstrap in phylogenetic reconstruction," Computational Statistics & Data Analysis, Elsevier, vol. 32(3-4), pages 273-283, January.
    14. Adolfo Quiroz, 1989. "Fast random generation of binary, t-ary and other types of trees," Journal of Classification, Springer;The Classification Society, vol. 6(1), pages 223-231, December.
    15. Vesna Radonjić Ɖogatović & Marko Ɖogatović & Milorad Stanojević & Nenad Mladenović, 2020. "Revenue maximization of Internet of things provider using variable neighbourhood search," Journal of Global Optimization, Springer, vol. 78(2), pages 375-396, October.
    16. Tudor Ionescu & Géraldine Polaillon & Frédéric Boulanger, 2010. "Minimum Tree Cost Quartet Puzzling," Journal of Classification, Springer;The Classification Society, vol. 27(2), pages 136-157, September.
    17. Rahul Siddharthan & Eric D Siggia & Erik van Nimwegen, 2005. "PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny," PLOS Computational Biology, Public Library of Science, vol. 1(7), pages 1-23, December.
    18. Shaowen Lan & Wenjuan Fan & Kaining Shao & Shanlin Yang & Panos M. Pardalos, 2022. "A column-generation-based approach for an integrated service planning and physician scheduling problem considering re-consultation," Journal of Combinatorial Optimization, Springer, vol. 44(5), pages 3446-3476, December.
    19. Angelo Sifaleras, 2023. "In memory of Professor Nenad Mladenović (1951–2022)," SN Operations Research Forum, Springer, vol. 4(1), pages 1-18, March.
    20. Alexei J Drummond & Simon Y W Ho & Matthew J Phillips & Andrew Rambaut, 2006. "Relaxed Phylogenetics and Dating with Confidence," PLOS Biology, Public Library of Science, vol. 4(5), pages 1-1, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jglopt:v:78:y:2020:i:2:d:10.1007_s10898-019-00871-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.