IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006898.html
   My bibliography  Save this article

Bayes-optimal estimation of overlap between populations of fixed size

Author

Listed:
  • Daniel B Larremore

Abstract

Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects—species, taxonomical units, or gene variants, depending on the context—can be directly counted. In practice, however, only a fraction of each population’s objects are likely to be sampled due to stochastic data collection or sequencing techniques. Although methods exists for quantifying population overlap under subsampled conditions, their bias is well documented and the uncertainty of their estimates cannot be quantified. Here we derive and validate a method to rigorously estimate the population overlap from incomplete samples when the total number of objects, species, or genes in each population is known, a special case of the more general β-diversity problem that is particularly relevant in the ecology and genomic epidemiology of malaria. By solving a Bayesian inference problem, this method takes into account the rates of subsampling and produces unbiased and Bayes-optimal estimates of overlap. In addition, it provides a natural framework for computing the uncertainty of its estimates, and can be used prospectively in study planning by quantifying the tradeoff between sampling effort and uncertainty.Author summary: Understanding when two populations are composed of similar species is important for ecologists, epidemiologists, and population geneticists, and in principle it is easy: just sample the two populations, compare the sets of species identified in each, and count how many appear in both populations. In practice, however, this is difficult because sampling methods typically produce only a random subset of the total population, leaving current population overlap estimates biased. Knowing only the number of shared members between two of these partial population samples, this paper shows how we can nevertheless estimate the true overlap between the full populations, when those full populations’ sizes are known. Using Bayesian statistics, we can also compute credible intervals to produce error bars. We show that using this unbiased approach has a dramatic impact on the conclusions one might draw from previously published studies in the malaria literature, which used simple but biased methods. Because the method in this paper quantifies the tradeoff between sampling effort and uncertainty, we also show how to compute the number of samples required to ensure high-confidence results, which may be useful for planning future studies or budgeting lab reagents and time.

Suggested Citation

  • Daniel B Larremore, 2019. "Bayes-optimal estimation of overlap between populations of fixed size," PLOS Computational Biology, Public Library of Science, vol. 15(3), pages 1-17, March.
  • Handle: RePEc:plo:pcbi00:1006898
    DOI: 10.1371/journal.pcbi.1006898
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006898
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006898&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006898?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Andy Stirling, 2007. "A General Framework for Analysing Diversity in Science, Technology and Society," SPRU Working Paper Series 156, SPRU - Science Policy Research Unit, University of Sussex Business School.
    2. Qixin He & Shai Pilosof & Kathryn E. Tiedje & Shazia Ruybal-Pesántez & Yael Artzy-Randrup & Edward B. Baskerville & Karen P. Day & Mercedes Pascual, 2018. "Networks of genetic similarity reveal non-neutral processes shape strain structure in Plasmodium falciparum," Nature Communications, Nature, vol. 9(1), pages 1-12, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stephen Carley & Alan L. Porter, 2012. "A forward diversity index," Scientometrics, Springer;Akadémiai Kiadó, vol. 90(2), pages 407-427, February.
    2. Jian Xu & Yi Bu & Ying Ding & Sinan Yang & Hongli Zhang & Chen Yu & Lin Sun, 2018. "Understanding the formation of interdisciplinary research from the perspective of keyword evolution: a case study on joint attention," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 973-995, November.
    3. Lin Zhang & Ronald Rousseau & Wolfgang Glänzel, 2016. "Diversity of references as an indicator of the interdisciplinarity of journals: Taking similarity between subject fields into account," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(5), pages 1257-1265, May.
    4. Diego Chavarro & Puay Tang & Ismael Rafols, 2014. "Interdisciplinarity and research on local issues: evidence from a developing country," Research Evaluation, Oxford University Press, vol. 23(3), pages 195-209.
    5. Ronald Rousseau, 2018. "The repeat rate: from Hirschman to Stirling," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(1), pages 645-653, July.
    6. Su, Hsin-Ning & Moaniba, Igam M., 2017. "Investigating the dynamics of interdisciplinary evolution in technology developments," Technological Forecasting and Social Change, Elsevier, vol. 122(C), pages 12-23.
    7. Loet Leydesdorff & Dieter Franz Kogler & Bowen Yan, 2017. "Mapping patent classifications: portfolio and statistical analysis, and the comparison of strengths and weaknesses," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(3), pages 1573-1591, September.
    8. Filippo Corsini & Rafael Laurenti & Franziska Meinherz & Francesco Paolo Appio & Luca Mora, 2019. "The Advent of Practice Theories in Research on Sustainable Consumption: Past, Current and Future Directions of the Field," Sustainability, MDPI, vol. 11(2), pages 1-19, January.
    9. Lorenz, Steffi, 2015. "Diversität und Verbundenheit der unternehmerischen Wissensbasis: Ein neuartiger Messansatz mit Indikatoren aus Innovationsprojekten," Discussion Papers on Strategy and Innovation 15-01, Philipps-University Marburg, Department of Technology and Innovation Management (TIM).
    10. Battke, Benedikt & Schmidt, Tobias S. & Stollenwerk, Stephan & Hoffmann, Volker H., 2016. "Internal or external spillovers—Which kind of knowledge is more likely to flow within or across technologies," Research Policy, Elsevier, vol. 45(1), pages 27-41.
    11. Timo Boppart & Kevin E. Staub, 2012. "Online accessibility of academic articles and the diversity of economics," ECON - Working Papers 075, Department of Economics - University of Zurich.
    12. Hackett, Edward J. & Leahey, Erin & Parker, John N. & Rafols, Ismael & Hampton, Stephanie E. & Corte, Ugo & Chavarro, Diego & Drake, John M. & Penders, Bart & Sheble, Laura & Vermeulen, Niki & Vision,, 2021. "Do synthesis centers synthesize? A semantic analysis of topical diversity in research," Research Policy, Elsevier, vol. 50(1).
    13. Dejing Kong & Jianzhong Yang & Lingfeng Li, 2020. "Early identification of technological convergence in numerical control machine tool: a deep learning approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 1983-2009, December.
    14. Xian Li & Ronald Rousseau & Liming Liang & Fangjie Xi & Yushuang Lü & Yifan Yuan & Xiaojun Hu, 2022. "Is low interdisciplinarity of references an unexpected characteristic of Nobel Prize winning research?," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 2105-2122, April.
    15. Rafols, Ismael & Leydesdorff, Loet & O’Hare, Alice & Nightingale, Paul & Stirling, Andy, 2012. "How journal rankings can suppress interdisciplinary research: A comparison between Innovation Studies and Business & Management," Research Policy, Elsevier, vol. 41(7), pages 1262-1282.
    16. van den Bergh, Jeroen C.J.M., 2008. "Optimal diversity: Increasing returns versus recombinant innovation," Journal of Economic Behavior & Organization, Elsevier, vol. 68(3-4), pages 565-580, December.
    17. Kerai, Anita & Sharma, Sunil, 2015. "Innovation in Business Group Firms: Influence of Network Diversity," IIMA Working Papers WP2015-03-26, Indian Institute of Management Ahmedabad, Research and Publication Department.
    18. Sándor Soós & Zsófia Vida & András Schubert, 2018. "Long-term trends in the multidisciplinarity of some typical natural and social sciences, and its implications on the SSH versus STM distinction," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 795-822, March.
    19. Yury Dranev & Maxim Kotsemir & Boris Syomin, 2018. "Diversity of research publications: relation to agricultural productivity and possible implications for STI policy," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(3), pages 1565-1587, September.
    20. Lina Xu & Steven Dellaportas & Zhiqiang Yang & Jin Wang, 2023. "More on the relationship between interdisciplinary accounting research and citation impact," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 63(4), pages 4779-4803, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006898. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.