IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1008927.html
   My bibliography  Save this article

On the cross-population generalizability of gene expression prediction models

Author

Listed:
  • Kevin L Keys
  • Angel C Y Mak
  • Marquitta J White
  • Walter L Eckalbar
  • Andrew W Dahl
  • Joel Mefford
  • Anna V Mikhaylova
  • María G Contreras
  • Jennifer R Elhawary
  • Celeste Eng
  • Donglei Hu
  • Scott Huntsman
  • Sam S Oh
  • Sandra Salazar
  • Michael A Lenoir
  • Jimmie C Ye
  • Timothy A Thornton
  • Noah Zaitlen
  • Esteban G Burchard
  • Christopher R Gignoux

Abstract

The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.Author summary: Advances in RNA sequencing technology have reduced the cost of measuring gene expression at a genome-wide level. However, sequencing enough human RNA samples for adequately-powered disease association studies remains prohibitively costly. To this end, modern transcriptome-wide association analysis tools leverage existing paired genotype-expression datasets by creating models to predict gene expression using genotypes. These predictive models enable researchers to perform cost-effective association tests with gene expression in independently genotyped samples. However, most of these models use European reference data, and the extent to which gene expression prediction models work across populations is not fully resolved. We observe that these models predict gene expression worse than expected in a dataset of African-Americans when derived from European-descent individuals. Using simulations, we show that gene expression predictive model performance depends on both the proportion of genetic variants shared between population-specific prediction models as well as the genetic relatedness between populations. Our findings suggest a need to carefully select reference populations for prediction and point to a pressing need for more genetically diverse genotype-expression datasets.

Suggested Citation

  • Kevin L Keys & Angel C Y Mak & Marquitta J White & Walter L Eckalbar & Andrew W Dahl & Joel Mefford & Anna V Mikhaylova & María G Contreras & Jennifer R Elhawary & Celeste Eng & Donglei Hu & Scott Hun, 2020. "On the cross-population generalizability of gene expression prediction models," PLOS Genetics, Public Library of Science, vol. 16(8), pages 1-28, August.
  • Handle: RePEc:plo:pgen00:1008927
    DOI: 10.1371/journal.pgen.1008927
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008927
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1008927&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1008927?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Carlos D. Bustamante & Francisco M. De La Vega & Esteban G. Burchard, 2011. "Genomics for the world," Nature, Nature, vol. 475(7355), pages 163-165, July.
    2. Alvaro N. Barbeira & Scott P. Dickinson & Rodrigo Bonazzola & Jiamao Zheng & Heather E. Wheeler & Jason M. Torres & Eric S. Torstenson & Kaanan P. Shah & Tzintzuni Garcia & Todd L. Edwards & Eli A. St, 2018. "Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics," Nature Communications, Nature, vol. 9(1), pages 1-20, December.
    3. Lauren S Mogil & Angela Andaleon & Alexa Badalamenti & Scott P Dickinson & Xiuqing Guo & Jerome I Rotter & W Craig Johnson & Hae Kyung Im & Yongmei Liu & Heather E Wheeler, 2018. "Genetic architecture of gene expression traits across diverse populations," PLOS Genetics, Public Library of Science, vol. 14(8), pages 1-21, August.
    4. Alice B. Popejoy & Stephanie M. Fullerton, 2016. "Genomics is failing on diversity," Nature, Nature, vol. 538(7624), pages 161-164, October.
    5. Heather E Wheeler & Kaanan P Shah & Jonathon Brenner & Tzintzuni Garcia & Keston Aquino-Michaels & GTEx Consortium & Nancy J Cox & Dan L Nicolae & Hae Kyung Im, 2016. "Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues," PLOS Genetics, Public Library of Science, vol. 12(11), pages 1-23, November.
    6. Sébastien Thériault & Nathalie Gaudreault & Maxime Lamontagne & Mickael Rosa & Marie-Chloé Boulanger & David Messika-Zeitoun & Marie-Annick Clavel & Romain Capoulade & François Dagenais & Philippe Pib, 2018. "A transcriptome-wide association study identifies PALMD as a susceptibility gene for calcific aortic valve stenosis," Nature Communications, Nature, vol. 9(1), pages 1-8, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nava Ehsan & Bence M. Kotis & Stephane E. Castel & Eric J. Song & Nicholas Mancuso & Pejman Mohammadi, 2024. "Haplotype-aware modeling of cis-regulatory effects highlights the gaps remaining in eQTL data," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    2. Qile Dai & Geyu Zhou & Hongyu Zhao & Urmo Võsa & Lude Franke & Alexis Battle & Alexander Teumer & Terho Lehtimäki & Olli T. Raitakari & Tõnu Esko & Michael P. Epstein & Jingjing Yang, 2023. "OTTERS: a powerful TWAS framework leveraging summary-level reference data," Nature Communications, Nature, vol. 14(1), pages 1-13, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Angela Andaleon & Lauren S Mogil & Heather E Wheeler, 2019. "Genetically regulated gene expression underlies lipid traits in Hispanic cohorts," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-21, August.
    2. Nadine R. Caron & Wilf Adam & Kate Anderson & Brooke T. Boswell & Meck Chongo & Viktor Deineko & Alexanne Dick & Shannon E. Hall & Jessica T. Hatcher & Patricia Howard & Megan Hunt & Kevin Linn & Ashl, 2023. "Partnering with First Nations in Northern British Columbia Canada to Reduce Inequity in Access to Genomic Research," IJERPH, MDPI, vol. 20(10), pages 1-31, May.
    3. Julian R Homburger & Andrés Moreno-Estrada & Christopher R Gignoux & Dominic Nelson & Elena Sanchez & Patricia Ortiz-Tello & Bernardo A Pons-Estel & Eduardo Acevedo-Vasquez & Pedro Miranda & Carl D La, 2015. "Genomic Insights into the Ancestry and Demographic History of South America," PLOS Genetics, Public Library of Science, vol. 11(12), pages 1-26, December.
    4. Rohini Chakravarthy & Sarah C Stallings & Michael Williams & Megan Hollister & Mario Davidson & Juan Canedo & Consuelo H Wilkins, 2020. "Factors influencing precision medicine knowledge and attitudes," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
    5. Michael G. Levin & Noah L. Tsao & Pankhuri Singhal & Chang Liu & Ha My T. Vy & Ishan Paranjpe & Joshua D. Backman & Tiffany R. Bellomo & William P. Bone & Kiran J. Biddinger & Qin Hui & Ozan Dikilitas, 2022. "Genome-wide association and multi-trait analyses characterize the common genetic architecture of heart failure," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    6. Michel S. Naslavsky & Marilia O. Scliar & Guilherme L. Yamamoto & Jaqueline Yu Ting Wang & Stepanka Zverinova & Tatiana Karp & Kelly Nunes & José Ricardo Magliocco Ceroni & Diego Lima Carvalho & Carlo, 2022. "Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    7. Randy L. Parrish & Aron S. Buchman & Shinya Tasaki & Yanling Wang & Denis Avey & Jishu Xu & Philip L. De Jager & David A. Bennett & Michael P. Epstein & Jingjing Yang, 2024. "SR-TWAS: leveraging multiple reference panels to improve transcriptome-wide association study power by ensemble machine learning," Nature Communications, Nature, vol. 15(1), pages 1-16, December.
    8. Pei-Kuan Cong & Wei-Yang Bai & Jin-Chen Li & Meng-Yuan Yang & Saber Khederzadeh & Si-Rui Gai & Nan Li & Yu-Heng Liu & Shi-Hui Yu & Wei-Wei Zhao & Jun-Quan Liu & Yi Sun & Xiao-Wei Zhu & Pian-Pian Zhao , 2022. "Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    9. Jingning Zhang & Jianan Zhan & Jin Jin & Cheng Ma & Ruzhang Zhao & Jared O’Connell & Yunxuan Jiang & Bertram L. Koelsch & Haoyu Zhang & Nilanjan Chatterjee, 2024. "An ensemble penalized regression method for multi-ancestry polygenic risk prediction," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    10. Jonathon P. Schuldt & Adam R. Pearson & Neil A. Lewis jr. & Ashley Jardina & Peter K. Enns, 2022. "Inequality and Misperceptions of Group Concerns Threaten the Integrity and Societal Impact of Science," The ANNALS of the American Academy of Political and Social Science, , vol. 700(1), pages 195-207, March.
    11. Brenton R Swenson & Tin Louie & Henry J Lin & Raúl Méndez-Giráldez & Jennifer E Below & Cathy C Laurie & Kathleen F Kerr & Heather Highland & Timothy A Thornton & Kelli K Ryckman & Charles Kooperberg , 2019. "GWAS of QRS duration identifies new loci specific to Hispanic/Latino populations," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-15, June.
    12. Wei Fu & Shin-Yi Chou & Li-San Wang, 2022. "NIH Grant Expansion, Ancestral Diversity and Scientific Discovery in Genomics Research," NBER Working Papers 30155, National Bureau of Economic Research, Inc.
    13. Md. Moksedul Momin & Jisu Shin & Soohyun Lee & Buu Truong & Beben Benyamin & S. Hong Lee, 2023. "A method for an unbiased estimate of cross-ancestry genetic correlation using individual-level data," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    14. Corbin Quick & Xiaoquan Wen & Gonçalo Abecasis & Michael Boehnke & Hyun Min Kang, 2020. "Integrating comprehensive functional annotations to boost power and accuracy in gene-based association analysis," PLOS Genetics, Public Library of Science, vol. 16(12), pages 1-23, December.
    15. Xena Marie Mapel & Naveen Kumar Kadri & Alexander S. Leonard & Qiongyu He & Audald Lloret-Villas & Meenu Bhati & Maya Hiltpold & Hubert Pausch, 2024. "Molecular quantitative trait loci in reproductive tissues impact male fertility in cattle," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    16. Alesha A. Hatton & Fei-Fei Cheng & Tian Lin & Ren-Juan Shen & Jie Chen & Zhili Zheng & Jia Qu & Fan Lyu & Sarah E. Harris & Simon R. Cox & Zi-Bing Jin & Nicholas G. Martin & Dongsheng Fan & Grant W. M, 2024. "Genetic control of DNA methylation is largely shared across European and East Asian populations," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    17. William J. Young & Jeffrey Haessler & Jan-Walter Benjamins & Linda Repetto & Jie Yao & Aaron Isaacs & Andrew R. Harper & Julia Ramirez & Sophie Garnier & Stefan Duijvenboden & Antoine R. Baldassari & , 2023. "Genetic architecture of spatial electrical biomarkers for cardiac arrhythmia and relationship with cardiovascular disease," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    18. Han Zhang & Lu Deng & William Wheeler & Jing Qin & Kai Yu, 2022. "Integrative analysis of multiple case‐control studies," Biometrics, The International Biometric Society, vol. 78(3), pages 1080-1091, September.
    19. Yaohua Yang & Yaxin Chen & Shuai Xu & Xingyi Guo & Guochong Jia & Jie Ping & Xiang Shu & Tianying Zhao & Fangcheng Yuan & Gang Wang & Yufang Xie & Hang Ci & Hongmo Liu & Yawen Qi & Yongjun Liu & Dan L, 2024. "Integrating muti-omics data to identify tissue-specific DNA methylation biomarkers for cancer risk," Nature Communications, Nature, vol. 15(1), pages 1-13, December.
    20. Xinyuan Dong & Yu-Ru Su & Richard Barfield & Stephanie A Bien & Qianchuan He & Tabitha A Harrison & Jeroen R Huyghe & Temitope O Keku & Noralane M Lindor & Clemens Schafmayer & Andrew T Chan & Stephen, 2020. "A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study," PLOS Genetics, Public Library of Science, vol. 16(8), pages 1-21, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1008927. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.