IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1008927.html
   My bibliography  Save this article

On the cross-population generalizability of gene expression prediction models

Author

Listed:
  • Kevin L Keys
  • Angel C Y Mak
  • Marquitta J White
  • Walter L Eckalbar
  • Andrew W Dahl
  • Joel Mefford
  • Anna V Mikhaylova
  • María G Contreras
  • Jennifer R Elhawary
  • Celeste Eng
  • Donglei Hu
  • Scott Huntsman
  • Sam S Oh
  • Sandra Salazar
  • Michael A Lenoir
  • Jimmie C Ye
  • Timothy A Thornton
  • Noah Zaitlen
  • Esteban G Burchard
  • Christopher R Gignoux

Abstract

The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.Author summary: Advances in RNA sequencing technology have reduced the cost of measuring gene expression at a genome-wide level. However, sequencing enough human RNA samples for adequately-powered disease association studies remains prohibitively costly. To this end, modern transcriptome-wide association analysis tools leverage existing paired genotype-expression datasets by creating models to predict gene expression using genotypes. These predictive models enable researchers to perform cost-effective association tests with gene expression in independently genotyped samples. However, most of these models use European reference data, and the extent to which gene expression prediction models work across populations is not fully resolved. We observe that these models predict gene expression worse than expected in a dataset of African-Americans when derived from European-descent individuals. Using simulations, we show that gene expression predictive model performance depends on both the proportion of genetic variants shared between population-specific prediction models as well as the genetic relatedness between populations. Our findings suggest a need to carefully select reference populations for prediction and point to a pressing need for more genetically diverse genotype-expression datasets.

Suggested Citation

  • Kevin L Keys & Angel C Y Mak & Marquitta J White & Walter L Eckalbar & Andrew W Dahl & Joel Mefford & Anna V Mikhaylova & María G Contreras & Jennifer R Elhawary & Celeste Eng & Donglei Hu & Scott Hun, 2020. "On the cross-population generalizability of gene expression prediction models," PLOS Genetics, Public Library of Science, vol. 16(8), pages 1-28, August.
  • Handle: RePEc:plo:pgen00:1008927
    DOI: 10.1371/journal.pgen.1008927
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008927
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1008927&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1008927?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alvaro N. Barbeira & Scott P. Dickinson & Rodrigo Bonazzola & Jiamao Zheng & Heather E. Wheeler & Jason M. Torres & Eric S. Torstenson & Kaanan P. Shah & Tzintzuni Garcia & Todd L. Edwards & Eli A. St, 2018. "Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics," Nature Communications, Nature, vol. 9(1), pages 1-20, December.
    2. Lauren S Mogil & Angela Andaleon & Alexa Badalamenti & Scott P Dickinson & Xiuqing Guo & Jerome I Rotter & W Craig Johnson & Hae Kyung Im & Yongmei Liu & Heather E Wheeler, 2018. "Genetic architecture of gene expression traits across diverse populations," PLOS Genetics, Public Library of Science, vol. 14(8), pages 1-21, August.
    3. Carlos D. Bustamante & Francisco M. De La Vega & Esteban G. Burchard, 2011. "Genomics for the world," Nature, Nature, vol. 475(7355), pages 163-165, July.
    4. Alice B. Popejoy & Stephanie M. Fullerton, 2016. "Genomics is failing on diversity," Nature, Nature, vol. 538(7624), pages 161-164, October.
    5. Heather E Wheeler & Kaanan P Shah & Jonathon Brenner & Tzintzuni Garcia & Keston Aquino-Michaels & GTEx Consortium & Nancy J Cox & Dan L Nicolae & Hae Kyung Im, 2016. "Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues," PLOS Genetics, Public Library of Science, vol. 12(11), pages 1-23, November.
    6. Sébastien Thériault & Nathalie Gaudreault & Maxime Lamontagne & Mickael Rosa & Marie-Chloé Boulanger & David Messika-Zeitoun & Marie-Annick Clavel & Romain Capoulade & François Dagenais & Philippe Pib, 2018. "A transcriptome-wide association study identifies PALMD as a susceptibility gene for calcific aortic valve stenosis," Nature Communications, Nature, vol. 9(1), pages 1-8, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nava Ehsan & Bence M. Kotis & Stephane E. Castel & Eric J. Song & Nicholas Mancuso & Pejman Mohammadi, 2024. "Haplotype-aware modeling of cis-regulatory effects highlights the gaps remaining in eQTL data," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    2. Qile Dai & Geyu Zhou & Hongyu Zhao & Urmo Võsa & Lude Franke & Alexis Battle & Alexander Teumer & Terho Lehtimäki & Olli T. Raitakari & Tõnu Esko & Michael P. Epstein & Jingjing Yang, 2023. "OTTERS: a powerful TWAS framework leveraging summary-level reference data," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    3. Jonas Meisner & Michael Eriksen Benros & Simon Rasmussen, 2025. "Leveraging haplotype information in heritability estimation and polygenic prediction," Nature Communications, Nature, vol. 16(1), pages 1-12, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Angela Andaleon & Lauren S Mogil & Heather E Wheeler, 2019. "Genetically regulated gene expression underlies lipid traits in Hispanic cohorts," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-21, August.
    2. Nadine R. Caron & Wilf Adam & Kate Anderson & Brooke T. Boswell & Meck Chongo & Viktor Deineko & Alexanne Dick & Shannon E. Hall & Jessica T. Hatcher & Patricia Howard & Megan Hunt & Kevin Linn & Ashl, 2023. "Partnering with First Nations in Northern British Columbia Canada to Reduce Inequity in Access to Genomic Research," IJERPH, MDPI, vol. 20(10), pages 1-31, May.
    3. Ido Amit & Kristin Ardlie & Fabiana Arzuaga & Gordon Awandare & Gary Bader & Alexander Bernier & Piero Carninci & Stacey Donnelly & Roland Eils & Alistair R. R. Forrest & Henry T. Greely & Roderic Gui, 2024. "The commitment of the human cell atlas to humanity," Nature Communications, Nature, vol. 15(1), pages 1-7, December.
    4. Michael G. Levin & Noah L. Tsao & Pankhuri Singhal & Chang Liu & Ha My T. Vy & Ishan Paranjpe & Joshua D. Backman & Tiffany R. Bellomo & William P. Bone & Kiran J. Biddinger & Qin Hui & Ozan Dikilitas, 2022. "Genome-wide association and multi-trait analyses characterize the common genetic architecture of heart failure," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    5. Michel S. Naslavsky & Marilia O. Scliar & Guilherme L. Yamamoto & Jaqueline Yu Ting Wang & Stepanka Zverinova & Tatiana Karp & Kelly Nunes & José Ricardo Magliocco Ceroni & Diego Lima Carvalho & Carlo, 2022. "Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    6. Wei Fu & Shin-Yi Chou & Li-San Wang, 2022. "NIH Grant Expansion, Ancestral Diversity and Scientific Discovery in Genomics Research," NBER Working Papers 30155, National Bureau of Economic Research, Inc.
    7. Baier, Tina & Lyngstad, Torkild Hovde, 2024. "Social Background Effects on Educational Outcomes - New Insights from Modern Genetic Science," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 76(3), pages 525-545.
    8. Richard Burns & William J. Young & Nay Aung & Luis R. Lopes & Perry M. Elliott & Petros Syrris & Roberto Barriales-Villa & Catrin Sohrabi & Steffen E. Petersen & Julia Ramírez & Alistair Young & Patri, 2024. "Genetic basis of right and left ventricular heart shape," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    9. Corbin Quick & Xiaoquan Wen & Gonçalo Abecasis & Michael Boehnke & Hyun Min Kang, 2020. "Integrating comprehensive functional annotations to boost power and accuracy in gene-based association analysis," PLOS Genetics, Public Library of Science, vol. 16(12), pages 1-23, December.
    10. Han Zhang & Lu Deng & William Wheeler & Jing Qin & Kai Yu, 2022. "Integrative analysis of multiple case‐control studies," Biometrics, The International Biometric Society, vol. 78(3), pages 1080-1091, September.
    11. Xinyuan Dong & Yu-Ru Su & Richard Barfield & Stephanie A Bien & Qianchuan He & Tabitha A Harrison & Jeroen R Huyghe & Temitope O Keku & Noralane M Lindor & Clemens Schafmayer & Andrew T Chan & Stephen, 2020. "A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study," PLOS Genetics, Public Library of Science, vol. 16(8), pages 1-21, August.
    12. Shim, Janet K. & Bentz, Michael & Vasquez, Emily & Jeske, Melanie & Saperstein, Aliya & Fullerton, Stephanie M. & Foti, Nicole & McMahon, Caitlin & Lee, Sandra Soo-Jin, 2022. "Strategies of inclusion: The tradeoffs of pursuing “baked in” diversity through place-based recruitment," Social Science & Medicine, Elsevier, vol. 306(C).
    13. Surina Singh & Ananyo Choudhury & Scott Hazelhurst & Nigel J. Crowther & Palwendé R. Boua & Hermann Sorgho & Godfred Agongo & Engelbert A. Nonterah & Lisa K. Micklesfield & Shane A. Norris & Isaac Kis, 2023. "Genome-wide association study meta-analysis of blood pressure traits and hypertension in sub-Saharan African populations: an AWI-Gen study," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    14. Max Lam & Chia-Yen Chen & W. David Hill & Charley Xia & Ruoyu Tian & Daniel F. Levey & Joel Gelernter & Murray B. Stein & Alexander S. Hatoum & Hailiang Huang & Anil K. Malhotra & Heiko Runz & Tian Ge, 2022. "Collective genomic segments with differential pleiotropic patterns between cognitive dimensions and psychopathology," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    15. Ananyo Choudhury & Jean-Tristan Brandenburg & Tinashe Chikowore & Dhriti Sengupta & Palwende Romuald Boua & Nigel J. Crowther & Godfred Agongo & Gershim Asiki & F. Xavier Gómez-Olivé & Isaac Kisiangan, 2022. "Meta-analysis of sub-Saharan African studies provides insights into genetic architecture of lipid traits," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    16. Naser Ansari-Pour & Yonglan Zheng & Toshio F. Yoshimatsu & Ayodele Sanni & Mustapha Ajani & Jean-Baptiste Reynier & Avraam Tapinos & Jason J. Pitt & Stefan Dentro & Anna Woodard & Padma Sheila Rajagop, 2021. "Whole-genome analysis of Nigerian patients with breast cancer reveals ethnic-driven somatic evolution and distinct genomic subtypes," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    17. Sébastien Thériault & Zhonglin Li & Erik Abner & Jian’an Luan & Hasanga D. Manikpurage & Ursula Houessou & Pardis Zamani & Mewen Briend & Dominique K. Boudreau & Nathalie Gaudreault & Lily Frenette & , 2024. "Integrative genomic analyses identify candidate causal genes for calcific aortic valve stenosis involving tissue-specific regulation," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    18. Jie Ping & Guochong Jia & Qiuyin Cai & Xingyi Guo & Ran Tao & Christine Ambrosone & Dezheng Huo & Stefan Ambs & Mollie E. Barnard & Yu Chen & Montserrat Garcia-Closas & Jian Gu & Jennifer J. Hu & Esth, 2024. "Using genome and transcriptome data from African-ancestry female participants to identify putative breast cancer susceptibility genes," Nature Communications, Nature, vol. 15(1), pages 1-8, December.
    19. Lulu Shang & Wei Zhao & Yi Zhe Wang & Zheng Li & Jerome J. Choi & Minjung Kho & Thomas H. Mosley & Sharon L. R. Kardia & Jennifer A. Smith & Xiang Zhou, 2023. "meQTL mapping in the GENOA study reveals genetic determinants of DNA methylation in African Americans," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    20. Ghislain Rocheleau & Shoa L. Clarke & Gaëlle Auguste & Natalie R. Hasbani & Alanna C. Morrison & Adam S. Heath & Lawrence F. Bielak & Kruthika R. Iyer & Erica P. Young & Nathan O. Stitziel & Goo Jun &, 2024. "Rare variant contribution to the heritability of coronary artery disease," Nature Communications, Nature, vol. 15(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1008927. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.