IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1006091.html
   My bibliography  Save this article

Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

Author

Listed:
  • Cameron Palmer
  • Itsik Pe’er

Abstract

Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data.Author Summary: Genetic research has been focused at analysis of datapoints that are assumed to be deterministically known. However, the majority of current, high throughput data is only probabilistically known, and proper methods for handing such uncertain genotypes are limited. Here, we build on existing theory from the field of statistics to introduce a general framework for handling probabilistic genotype data obtained through genotype imputation. This framework, called Multiple Imputation, matches or improves upon existing methods for handling uncertainty in basic analysis of genetic association. As opposed to such methods, our work furthermore extends to more advanced analysis, such as mixed-effects models, with no additional complication. Importantly, it generates posterior probabilities of association that are intrinsically weighted by the certainty of the underlying data, a feature unmatched by other existing methods. Multiple Imputation is also fully compatible with meta-analysis. Finally, our analysis of probabilistic genotype data brings into focus the accuracy and unreliability of imputation’s estimated probabilities. Taken together, these results substantially increase the utility of imputed genotypes in statistical genetics, and may have strong implications for analysis of sequencing data moving forward.

Suggested Citation

  • Cameron Palmer & Itsik Pe’er, 2016. "Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation," PLOS Genetics, Public Library of Science, vol. 12(6), pages 1-17, June.
  • Handle: RePEc:plo:pgen00:1006091
    DOI: 10.1371/journal.pgen.1006091
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006091
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1006091&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1006091?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Benjamin M Neale & Manuel A Rivas & Benjamin F Voight & David Altshuler & Bernie Devlin & Marju Orho-Melander & Sekar Kathiresan & Shaun M Purcell & Kathryn Roeder & Mark J Daly, 2011. "Testing for an Unusual Distribution of Rare Variants," PLOS Genetics, Public Library of Science, vol. 7(3), pages 1-8, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chung-Feng Kao & Jia-Rou Liu & Hung Hung & Po-Hsiu Kuo, 2015. "A Robust GWSS Method to Simultaneously Detect Rare and Common Variants for Complex Disease," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-14, April.
    2. Elodie Persyn & Richard Redon & Lise Bellanger & Christian Dina, 2018. "The impact of a fine-scale population stratification on rare variant association test results," PLOS ONE, Public Library of Science, vol. 13(12), pages 1-17, December.
    3. Wenjing Qi & Andrew S Allen & Yi-Ju Li, 2019. "Family-based association tests for rare variants with censored traits," PLOS ONE, Public Library of Science, vol. 14(1), pages 1-17, January.
    4. Nanye Long & Samuel P Dickson & Jessica M Maia & Hee Shin Kim & Qianqian Zhu & Andrew S Allen, 2013. "Leveraging Prior Information to Detect Causal Variants via Multi-Variant Regression," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-11, June.
    5. Xinge Jessie Jeng & Zhongyin John Daye & Wenbin Lu & Jung-Ying Tzeng, 2016. "Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level," PLOS Computational Biology, Public Library of Science, vol. 12(6), pages 1-23, June.
    6. Wan-Yu Lin, 2014. "Adaptive Combination of P-Values for Family-Based Association Testing with Sequence Data," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-16, December.
    7. Yukinori Okada & Dorothee Diogo & Jeffrey D Greenberg & Faten Mouassess & Walid A L Achkar & Robert S Fulton & Joshua C Denny & Namrata Gupta & Daniel Mirel & Stacy Gabriel & Gang Li & Joel M Kremer &, 2014. "Integration of Sequence Data from a Consanguineous Family with Genetic Data from an Outbred Population Identifies PLB1 as a Candidate Rheumatoid Arthritis Risk Gene," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-12, February.
    8. Ren-Hua Chung & Wei-Yun Tsai & Eden R Martin, 2014. "Family-Based Association Test Using Both Common and Rare Variants and Accounting for Directions of Effects for Sequencing Data," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-7, September.
    9. Boyang Fu & Ali Pazokitoroudi & Mukund Sudarshan & Zhengtong Liu & Lakshminarayanan Subramanian & Sriram Sankararaman, 2023. "Fast kernel-based association testing of non-linear genetic effects for biobank-scale data," Nature Communications, Nature, vol. 14(1), pages 1-8, December.
    10. Zhenchuan Wang & Qiuying Sha & Shuanglin Zhang, 2016. "Joint Analysis of Multiple Traits Using "Optimal" Maximum Heritability Test," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-12, March.
    11. Daniel D Kinnamon & Ray E Hershberger & Eden R Martin, 2012. "Reconsidering Association Testing Methods Using Single-Variant Test Statistics as Alternatives to Pooling Tests for Sequence Data with Rare Variants," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-15, February.
    12. Mischan Vali-Pour & Solip Park & Jose Espinosa-Carrasco & Daniel Ortiz-Martínez & Ben Lehner & Fran Supek, 2022. "The impact of rare germline variants on human somatic mutation processes," Nature Communications, Nature, vol. 13(1), pages 1-21, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1006091. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.