IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1007452.html
   My bibliography  Save this article

Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes

Author

Listed:
  • Yu Jiang
  • Sai Chen
  • Daniel McGuire
  • Fang Chen
  • Mengzhen Liu
  • William G Iacono
  • John K Hewitt
  • John E Hokanson
  • Kenneth Krauter
  • Markku Laakso
  • Kevin W Li
  • Sharon M Lutz
  • Matthew McGue
  • Anita Pandit
  • Gregory J M Zajac
  • Michael Boehnke
  • Goncalo R Abecasis
  • Scott I Vrieze
  • Xiaowei Zhan
  • Bibo Jiang
  • Dajiang J Liu

Abstract

Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.Author summary: It is of great interest to estimate the joint effects of multiple variants from large scale meta-analyses, in order to fine-map causal variants and understand the genetic architecture for complex traits. The summary association statistics from participating studies in a meta-analysis often contain missing values at some variant sites, as the imputation methods may not work well and the variants with low imputation quality will be filtered out. Missingness is especially likely when the underlying genetic variant is rare or the participating studies use targeted genotyping array that is not suitable for imputation. Existing methods for conditional meta-analysis do not properly handle missing data, and can incorrectly estimate correlations between score statistics. As a result, they can produce highly inflated type-I errors for conditional analysis, which will result in overestimated phenotypic variance explained and incorrect identification of causal variants. We systematically evaluated this bias and proposed a novel partial correlation based score statistic. The new statistic has valid type-I errors for conditional analysis and much higher power than the existing methods, even when the contributed summary statistics contain a large fraction of missing values. We expect this method to be highly useful in the sequencing age for complex trait genetics.

Suggested Citation

  • Yu Jiang & Sai Chen & Daniel McGuire & Fang Chen & Mengzhen Liu & William G Iacono & John K Hewitt & John E Hokanson & Kenneth Krauter & Markku Laakso & Kevin W Li & Sharon M Lutz & Matthew McGue & An, 2018. "Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes," PLOS Genetics, Public Library of Science, vol. 14(7), pages 1-19, July.
  • Handle: RePEc:plo:pgen00:1007452
    DOI: 10.1371/journal.pgen.1007452
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007452
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1007452&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1007452?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alamoodi, A.H. & Zaidan, B.B. & Zaidan, A.A. & Albahri, O.S. & Chen, Juliana & Chyad, M.A. & Garfan, Salem & Aleesa, A.M., 2021. "Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation," Chaos, Solitons & Fractals, Elsevier, vol. 151(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1007452. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.