IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v17y2018i6p12n1.html
   My bibliography  Save this article

A practical approach to adjusting for population stratification in genome-wide association studies: principal components and propensity scores (PCAPS)

Author

Listed:
  • Zhao Huaqing

    (Department of Clinical Sciences, Temple University School of Medicine, 3440 N. Broad Street, Kresge Hall East, Room 218, Philadelphia, PA 19140, USA, Phone: 215-707-6139, Fax: 215-707-3160)

  • Mitra Nandita

    (Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA 19104, USA)

  • Kanetsky Peter A.

    (Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA)

  • Nathanson Katherine L.

    (Department of Medicine, University of Pennsylvania, South Pavilion, Perelman Center for Advanced Medicine, Philadelphia, PA 19104, USA)

  • Rebbeck Timothy R.

    (Division of Population Sciences, Dana Farber Cancer Institute and Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA)

Abstract

Genome-wide association studies (GWAS) are susceptible to bias due to population stratification (PS). The most widely used method to correct bias due to PS is principal components (PCs) analysis (PCA), but there is no objective method to guide which PCs to include as covariates. Often, the ten PCs with the highest eigenvalues are included to adjust for PS. This selection is arbitrary, and patterns of local linkage disequilibrium may affect PCA corrections. To address these limitations, we estimate genomic propensity scores based on all statistically significant PCs selected by the Tracy-Widom (TW) statistic. We compare a principal components and propensity scores (PCAPS) approach to PCA and EMMAX using simulated GWAS data under no, moderate, and severe PS. PCAPS reduced spurious genetic associations regardless of the degree of PS, resulting in odds ratio (OR) estimates closer to the true OR. We illustrate our PCAPS method using GWAS data from a study of testicular germ cell tumors. PCAPS provided a more conservative adjustment than PCA. Advantages of the PCAPS approach include reduction of bias compared to PCA, consistent selection of propensity scores to adjust for PS, the potential ability to handle outliers, and ease of implementation using existing software packages.

Suggested Citation

  • Zhao Huaqing & Mitra Nandita & Kanetsky Peter A. & Nathanson Katherine L. & Rebbeck Timothy R., 2018. "A practical approach to adjusting for population stratification in genome-wide association studies: principal components and propensity scores (PCAPS)," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 17(6), pages 1-12, December.
  • Handle: RePEc:bpj:sagmbi:v:17:y:2018:i:6:p:12:n:1
    DOI: 10.1515/sagmb-2017-0054
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2017-0054
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2017-0054?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Nick Patterson & Alkes L Price & David Reich, 2006. "Population Structure and Eigenanalysis," PLOS Genetics, Public Library of Science, vol. 2(12), pages 1-20, December.
    2. Zhao Huaqing & Rebbeck Timothy R. & Mitra Nandita, 2012. "Analyzing Genetic Association Studies with an Extended Propensity Score Approach," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-24, October.
    3. Lin, D. Y. & Zeng, D., 2011. "Correcting for Population Stratification in Genomewide Association Studies," Journal of the American Statistical Association, American Statistical Association, vol. 106(495), pages 997-1008.
    4. B. Devlin & Kathryn Roeder, 1999. "Genomic Control for Association Studies," Biometrics, The International Biometric Society, vol. 55(4), pages 997-1004, December.
    5. Matthieu Bouaziz & Christophe Ambroise & Mickael Guedj, 2011. "Accounting for Population Stratification in Practice: A Comparison of the Main Strategies Dedicated to Genome-Wide Association Studies," PLOS ONE, Public Library of Science, vol. 6(12), pages 1-13, December.
    6. Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhao Huaqing & Rebbeck Timothy R. & Mitra Nandita, 2012. "Analyzing Genetic Association Studies with an Extended Propensity Score Approach," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-24, October.
    2. Ilja M Nolte & Chris Wallace & Stephen J Newhouse & Daryl Waggott & Jingyuan Fu & Nicole Soranzo & Rhian Gwilliam & Panos Deloukas & Irina Savelieva & Dongling Zheng & Chrysoula Dalageorgou & Martin F, 2009. "Common Genetic Variation Near the Phospholamban Gene Is Associated with Cardiac Repolarisation: Meta-Analysis of Three Genome-Wide Association Studies," PLOS ONE, Public Library of Science, vol. 4(7), pages 1-10, July.
    3. Peristera Paschou & Petros Drineas & Jamey Lewis & Caroline M Nievergelt & Deborah A Nickerson & Joshua D Smith & Paul M Ridker & Daniel I Chasman & Ronald M Krauss & Elad Ziv, 2008. "Tracing Sub-Structure in the European American Population with PCA-Informative Markers," PLOS Genetics, Public Library of Science, vol. 4(7), pages 1-13, July.
    4. Kai Yu & Zhaoming Wang & Qizhai Li & Sholom Wacholder & David J Hunter & Robert N Hoover & Stephen Chanock & Gilles Thomas, 2008. "Population Substructure and Control Selection in Genome-Wide Association Studies," PLOS ONE, Public Library of Science, vol. 3(7), pages 1-14, July.
    5. Marie-Claude Babron & Marie de Tayrac & Douglas N Rutledge & Eleftheria Zeggini & Emmanuelle Génin, 2012. "Rare and Low Frequency Variant Stratification in the UK Population: Description and Impact on Association Tests," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-9, October.
    6. Ning Jiang & Minghui Wang & Tianye Jia & Lin Wang & Lindsey Leach & Christine Hackett & David Marshall & Zewei Luo, 2011. "A Robust Statistical Method for Association-Based eQTL Analysis," PLOS ONE, Public Library of Science, vol. 6(8), pages 1-11, August.
    7. André X C N Valente & Joseph Zischkau & Joo Heon Shin & Yuan Gao & Abhijit Sarkar, 2012. "Genome-Wide Association Study Heterogeneous Cohort Homogenization via Subject Weight Knock-Down," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-10, October.
    8. Thomas Charlon & Manuel Martínez-Bueno & Lara Bossini-Castillo & F David Carmona & Alessandro Di Cara & Jérôme Wojcik & Sviatoslav Voloshynovskiy & Javier Martín & Marta E Alarcón-Riquelme, 2016. "Single Nucleotide Polymorphism Clustering in Systemic Autoimmune Diseases," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-10, August.
    9. Li, Zhaohai & Zhang, Hong & Zheng, Gang & Gastwirth, Joseph L. & Gail, Mitchell H., 2009. "Excess false positive rate caused by population stratification and disease rate heterogeneity in case-control association studies," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1767-1781, March.
    10. Lei Zhang & Yu-Fang Pei & Jian Li & Christopher J Papasian & Hong-Wen Deng, 2009. "Univariate/Multivariate Genome-Wide Association Scans Using Data from Families and Unrelated Samples," PLOS ONE, Public Library of Science, vol. 4(8), pages 1-12, August.
    11. van de Walle, Dominique & Mu, Ren, 2007. "Fungibility and the flypaper effect of project aid: Micro-evidence for Vietnam," Journal of Development Economics, Elsevier, vol. 84(2), pages 667-685, November.
    12. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    13. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    14. Turner, Alex J. & Fichera, Eleonora & Sutton, Matt, 2021. "The effects of in-utero exposure to influenza on mental health and mortality risk throughout the life-course," Economics & Human Biology, Elsevier, vol. 43(C).
    15. Dominic Holland & Oleksandr Frei & Rahul Desikan & Chun-Chieh Fan & Alexey A Shadrin & Olav B Smeland & V S Sundar & Paul Thompson & Ole A Andreassen & Anders M Dale, 2020. "Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-30, May.
    16. Roxana Elena Manea, 2021. "School Feeding Programmes, Education and Food Security in Rural Malawi," CIES Research Paper series 63-2020, Centre for International Environmental Studies, The Graduate Institute.
    17. Dettmann, E. & Becker, C. & Schmeißer, C., 2011. "Distance functions for matching in small samples," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1942-1960, May.
    18. José de Sousa & Guillaume Hollard, 2021. "From Micro to Macro Gender Differences: Evidence from Field Tournaments," Post-Print hal-03389151, HAL.
    19. repec:ags:jrapmc:122316 is not listed on IDEAS
    20. Gunther Bensch & Jörg Peters, 2013. "Alleviating Deforestation Pressures? Impacts of Improved Stove Dissemination on Charcoal Consumption in Urban Senegal," Land Economics, University of Wisconsin Press, vol. 89(4), pages 676-698.
    21. G. Miller & Yuriy Pylypchuk, 2014. "Marital Status, Spousal Characteristics, and the Use of Preventive Care," Journal of Family and Economic Issues, Springer, vol. 35(3), pages 323-338, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:17:y:2018:i:6:p:12:n:1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.