IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1004234.html
   My bibliography  Save this article

A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness

Author

Listed:
  • Jared O'Connell
  • Deepti Gurdasani
  • Olivier Delaneau
  • Nicola Pirastu
  • Sheila Ulivi
  • Massimiliano Cocca
  • Michela Traglia
  • Jie Huang
  • Jennifer E Huffman
  • Igor Rudan
  • Ruth McQuillan
  • Ross M Fraser
  • Harry Campbell
  • Ozren Polasek
  • Gershim Asiki
  • Kenneth Ekoru
  • Caroline Hayward
  • Alan F Wright
  • Veronique Vitart
  • Pau Navarro
  • Jean-Francois Zagury
  • James F Wilson
  • Daniela Toniolo
  • Paolo Gasparini
  • Nicole Soranzo
  • Manjinder S Sandhu
  • Jonathan Marchini

Abstract

Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally ‘unrelated’ individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.Author Summary: Every individual carries two copies of each chromosome (haplotypes), one from each of their parents, that consist of a long sequence of alleles. Modern genotyping technologies do not measure haplotypes directly, but the combined sum (or genotype) of alleles at each site. Statistical methods are needed to infer (or phase) the haplotypes from the observed genotypes. Haplotype estimation is a key first step of many disease and population genetic studies. Much recent work in this area has focused on phasing in cohorts of nominally unrelated individuals. So called ‘long range phasing’ is a relatively recent concept for phasing individuals with intermediate levels of relatedness, such as cohorts taken from population isolates. Methods also exist for phasing genotypes for individuals within explicit pedigrees. Whilst high quality phasing techniques are available for each of these demographic scenarios, to date, no single method is applicable to all three. In this paper, we present a general approach for phasing cohorts that contain any level of relatedness between the study individuals. We demonstrate high levels of accuracy in all demographic scenarios, as well as the ability to detect (Mendelian consistent) genotyping error and recombination events in duos and trios, the first method with such a capability.

Suggested Citation

  • Jared O'Connell & Deepti Gurdasani & Olivier Delaneau & Nicola Pirastu & Sheila Ulivi & Massimiliano Cocca & Michela Traglia & Jie Huang & Jennifer E Huffman & Igor Rudan & Ruth McQuillan & Ross M Fra, 2014. "A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness," PLOS Genetics, Public Library of Science, vol. 10(4), pages 1-21, April.
  • Handle: RePEc:plo:pgen00:1004234
    DOI: 10.1371/journal.pgen.1004234
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004234
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1004234&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1004234?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Craig, Sarah J.C. & Kenney, Ana M. & Lin, Junli & Paul, Ian M. & Birch, Leann L. & Savage, Jennifer S. & Marini, Michele E. & Chiaromonte, Francesca & Reimherr, Matthew L. & Makova, Kateryna D., 2023. "Constructing a polygenic risk score for childhood obesity using functional data analysis," Econometrics and Statistics, Elsevier, vol. 25(C), pages 66-86.
    2. Masao Ueki, 2024. "Data-Adaptive Multivariate Test for Genomic Studies Using Fused Lasso," Mathematics, MDPI, vol. 12(10), pages 1-16, May.
    3. L Bottolo & S Richardson, 2019. "Discussion of ‘Gene hunting with hidden Markov model knockoffs’," Biometrika, Biometrika Trust, vol. 106(1), pages 19-22.
    4. Andrew D Bretherick & Oriol Canela-Xandri & Peter K Joshi & David W Clark & Konrad Rawlik & Thibaud S Boutin & Yanni Zeng & Carmen Amador & Pau Navarro & Igor Rudan & Alan F Wright & Harry Campbell & , 2020. "Linking protein to phenotype with Mendelian Randomization detects 38 proteins with causal roles in human diseases and traits," PLOS Genetics, Public Library of Science, vol. 16(7), pages 1-24, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1004234. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.