IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v309y2022i1d10.1007_s10479-021-04352-1.html
   My bibliography  Save this article

Reflections on kernelizing and computing unrooted agreement forests

Author

Listed:
  • Rim Wersch

    (Maastricht University)

  • Steven Kelk

    (Maastricht University)

  • Simone Linz

    (University of Auckland)

  • Georgios Stamoulis

    (Maastricht University)

Abstract

Phylogenetic trees are leaf-labelled trees used to model the evolution of species. Here we explore the practical impact of kernelization (i.e. data reduction) on the NP-hard problem of computing the TBR distance between two unrooted binary phylogenetic trees. This problem is better-known in the literature as the maximum agreement forest problem, where the goal is to partition the two trees into a minimum number of common, non-overlapping subtrees. We have implemented two well-known reduction rules, the subtree and chain reduction, and five more recent, theoretically stronger reduction rules, and compare the reduction achieved with and without the stronger rules. We find that the new rules yield smaller reduced instances and thus have clear practical added value. In many cases they also cause the TBR distance to decrease in a controlled fashion, which can further facilitate solving the problem in practice. Next, we compare the achieved reduction to the known worst-case theoretical bounds of $$15k-9$$ 15 k - 9 and $$11k-9$$ 11 k - 9 respectively, on the number of leaves of the two reduced trees, where k is the TBR distance, observing in both cases a far larger reduction in practice. As a by-product of our experimental framework we obtain a number of new insights into the actual computation of TBR distance. We find, for example, that very strong lower bounds on TBR distance can be obtained efficiently by randomly sampling certain carefully constructed partitions of the leaf labels, and identify instances which seem particularly challenging to solve exactly. The reduction rules have been implemented within our new solver Tubro which combines kernelization with an Integer Linear Programming (ILP) approach. Tubro also incorporates a number of additional features, such as a cluster reduction and a practical upper-bounding heuristic, and it can leverage combinatorial insights emerging from the proofs of correctness of the reduction rules to simplify the ILP.

Suggested Citation

  • Rim Wersch & Steven Kelk & Simone Linz & Georgios Stamoulis, 2022. "Reflections on kernelizing and computing unrooted agreement forests," Annals of Operations Research, Springer, vol. 309(1), pages 425-451, February.
  • Handle: RePEc:spr:annopr:v:309:y:2022:i:1:d:10.1007_s10479-021-04352-1
    DOI: 10.1007/s10479-021-04352-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10479-021-04352-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10479-021-04352-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. V. Chvatal, 1979. "A Greedy Heuristic for the Set-Covering Problem," Mathematics of Operations Research, INFORMS, vol. 4(3), pages 233-235, August.
    2. Jochen Alber & Nadja Betzler & Rolf Niedermeier, 2006. "Experiments on data reduction for optimal domination in networks," Annals of Operations Research, Springer, vol. 146(1), pages 105-117, September.
    3. Ruriko Yoshida & Kenji Fukumizu & Chrysafis Vogiatzis, 2019. "Multilocus phylogenetic analysis with gene tree clustering," Annals of Operations Research, Springer, vol. 276(1), pages 293-313, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marjan Marzban & Qian-Ping Gu & Xiaohua Jia, 2016. "New analysis and computational study for the planar connected dominating set problem," Journal of Combinatorial Optimization, Springer, vol. 32(1), pages 198-225, July.
    2. Davidov, Sreten & Pantoš, Miloš, 2017. "Planning of electric vehicle infrastructure based on charging reliability and quality of service," Energy, Elsevier, vol. 118(C), pages 1156-1167.
    3. Song, Zhe & Kusiak, Andrew, 2010. "Mining Pareto-optimal modules for delayed product differentiation," European Journal of Operational Research, Elsevier, vol. 201(1), pages 123-128, February.
    4. Seona Lee & Sang-Ho Lee & HyungJune Lee, 2020. "Timely directional data delivery to multiple destinations through relay population control in vehicular ad hoc network," International Journal of Distributed Sensor Networks, , vol. 16(5), pages 15501477209, May.
    5. Zhuang, Yanling & Zhou, Yun & Yuan, Yufei & Hu, Xiangpei & Hassini, Elkafi, 2022. "Order picking optimization with rack-moving mobile robots and multiple workstations," European Journal of Operational Research, Elsevier, vol. 300(2), pages 527-544.
    6. Menghong Li & Yingli Ran & Zhao Zhang, 2022. "A primal-dual algorithm for the minimum power partial cover problem," Journal of Combinatorial Optimization, Springer, vol. 44(3), pages 1913-1923, October.
    7. Wang, Yiyuan & Pan, Shiwei & Al-Shihabi, Sameh & Zhou, Junping & Yang, Nan & Yin, Minghao, 2021. "An improved configuration checking-based algorithm for the unicost set covering problem," European Journal of Operational Research, Elsevier, vol. 294(2), pages 476-491.
    8. C Guéret & N Jussien & O Lhomme & C Pavageau & C Prins, 2003. "Loading aircraft for military operations," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(5), pages 458-465, May.
    9. Keisuke Murakami, 2018. "Iterative Column Generation Algorithm for Generalized Multi-Vehicle Covering Tour Problem," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 35(04), pages 1-22, August.
    10. Dongyue Liang & Zhao Zhang & Xianliang Liu & Wei Wang & Yaolin Jiang, 2016. "Approximation algorithms for minimum weight partial connected set cover problem," Journal of Combinatorial Optimization, Springer, vol. 31(2), pages 696-712, February.
    11. Abdullah Alshehri & Mahmoud Owais & Jayadev Gyani & Mishal H. Aljarbou & Saleh Alsulamy, 2023. "Residual Neural Networks for Origin–Destination Trip Matrix Estimation from Traffic Sensor Information," Sustainability, MDPI, vol. 15(13), pages 1-21, June.
    12. Wedelin, Dag, 1995. "The design of a 0-1 integer optimizer and its application in the Carmen system," European Journal of Operational Research, Elsevier, vol. 87(3), pages 722-730, December.
    13. Victor Reyes & Ignacio Araya, 2021. "A GRASP-based scheme for the set covering problem," Operational Research, Springer, vol. 21(4), pages 2391-2408, December.
    14. Owais, Mahmoud & Moussa, Ghada S. & Hussain, Khaled F., 2019. "Sensor location model for O/D estimation: Multi-criteria meta-heuristics approach," Operations Research Perspectives, Elsevier, vol. 6(C).
    15. Weiyi Ding & Xiaoxian Tang, 2021. "Projections of Tropical Fermat-Weber Points," Mathematics, MDPI, vol. 9(23), pages 1-23, December.
    16. Dan Garber, 2021. "Efficient Online Linear Optimization with Approximation Algorithms," Mathematics of Operations Research, INFORMS, vol. 46(1), pages 204-220, February.
    17. Manki Min & Oleg Prokopyev & Panos M. Pardalos, 2006. "Optimal solutions to minimum total energy broadcasting problem in wireless ad hoc networks," Journal of Combinatorial Optimization, Springer, vol. 11(1), pages 59-69, February.
    18. Manki Min & Panos M. Pardalos, 2007. "Total energy optimal multicasting in wireless ad hoc networks," Journal of Combinatorial Optimization, Springer, vol. 13(4), pages 365-378, May.
    19. Davidov, Sreten & Pantoš, Miloš, 2017. "Stochastic expansion planning of the electric-drive vehicle charging infrastructure," Energy, Elsevier, vol. 141(C), pages 189-201.
    20. Vazifeh, Mohammad M. & Zhang, Hongmou & Santi, Paolo & Ratti, Carlo, 2019. "Optimizing the deployment of electric vehicle charging stations using pervasive mobility data," Transportation Research Part A: Policy and Practice, Elsevier, vol. 121(C), pages 75-91.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:309:y:2022:i:1:d:10.1007_s10479-021-04352-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.