IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008561.html
   My bibliography  Save this article

Sampling bias and model choice in continuous phylogeography: Getting lost on a random walk

Author

Listed:
  • Antanas Kalkauskas
  • Umberto Perron
  • Yuxuan Sun
  • Nick Goldman
  • Guy Baele
  • Stephane Guindon
  • Nicola De Maio

Abstract

Phylogeographic inference allows reconstruction of past geographical spread of pathogens or living organisms by integrating genetic and geographic data. A popular model in continuous phylogeography—with location data provided in the form of latitude and longitude coordinates—describes spread as a Brownian motion (Brownian Motion Phylogeography, BMP) in continuous space and time, akin to similar models of continuous trait evolution. Here, we show that reconstructions using this model can be strongly affected by sampling biases, such as the lack of sampling from certain areas. As an attempt to reduce the effects of sampling bias on BMP, we consider the addition of sequence-free samples from under-sampled areas. While this approach alleviates the effects of sampling bias, in most scenarios this will not be a viable option due to the need for prior knowledge of an outbreak’s spatial distribution. We therefore consider an alternative model, the spatial Λ-Fleming-Viot process (ΛFV), which has recently gained popularity in population genetics. Despite the ΛFV’s robustness to sampling biases, we find that the different assumptions of the ΛFV and BMP models result in different applicabilities, with the ΛFV being more appropriate for scenarios of endemic spread, and BMP being more appropriate for recent outbreaks or colonizations.Author summary: Phylogeography studies past location and migration using information from current geographic locations of genetic sequences. For example, phylogeography can be used to reconstruct the history of geographical spread of an outbreak using the genetic sequences of the pathogen collected at different times and locations. Here, we investigate the effects of different model assumptions on phylogeographic inference. In particular, we examine the effects of the strategy used to collect samples. We show that sample collection biases can have a strong impact on the quality of phylogeographic reconstruction: geographically biased sampling scheme can be very detrimental for popular continuous phylogeography models. We consider different ways to counter these effects, from utilising alternative phylogeographic models, to the inclusion of partially informative samples (known cases without genetic sequences). While these strategies do alleviate the effects of sampling biases, they also lead to considerable additional computational burden. We also investigate the intrinsic differences of different phylogeographic models, and their effects on reconstructed patterns in different scenarios.

Suggested Citation

  • Antanas Kalkauskas & Umberto Perron & Yuxuan Sun & Nick Goldman & Guy Baele & Stephane Guindon & Nicola De Maio, 2021. "Sampling bias and model choice in continuous phylogeography: Getting lost on a random walk," PLOS Computational Biology, Public Library of Science, vol. 17(1), pages 1-27, January.
  • Handle: RePEc:plo:pcbi00:1008561
    DOI: 10.1371/journal.pcbi.1008561
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008561
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008561&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008561?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Santiago Justo Arevalo & Carmen Sofia Uribe Calampa & Cinthy Jimenez Silva & Mauro Quiñones Aguilar & Remco Bouckaert & Joao Renato Rebello Pinho, 2023. "Phylodynamic of SARS-CoV-2 during the second wave of COVID-19 in Peru," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    2. Idrissa Nonmon Sanogo & Claire Guinat & Simon Dellicour & Mohamed Adama Diakité & Mamadou Niang & Ousmane A Koita & Christelle Camus & Mariette F. Ducatez & Mariette Ducatez, 2024. "Genetic insights of H9N2 avian influenza viruses circulating in Mali and phylogeographic patterns in Northern and Western Africa," Post-Print hal-04498485, HAL.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008561. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.