Author
Listed:
- Alexander L. Han
(Baylor College of Medicine
Texas Children’s Hospital)
- Chloe F. Sands
(Baylor College of Medicine
Texas Children’s Hospital)
- Dorota Matelska
(AstraZeneca)
- Jessica C. Butts
(Rice University
Rice University)
- Vida Ravanmehr
(Baylor College of Medicine
Texas Children’s Hospital)
- Fengyuan Hu
(AstraZeneca)
- Esmeralda Villavicencio Gonzalez
(Texas Children’s Hospital
Baylor College of Medicine)
- Nicholas Katsanis
(Galatea Bio, Inc)
- Carlos D. Bustamante
(Galatea Bio, Inc)
- Quanli Wang
(AstraZeneca)
- Slavé Petrovski
(AstraZeneca
University of Melbourne)
- Dimitrios Vitsios
(AstraZeneca)
- Ryan S. Dhindsa
(Baylor College of Medicine
Texas Children’s Hospital
Baylor College of Medicine)
Abstract
The unprecedented scale of genomic databases has revolutionized our ability to identify regions in the human genome intolerant to variation—regions often implicated in disease. However, these datasets remain constrained by limited ancestral diversity. Here, we analyze whole-exome sequencing data from 460,551 UK Biobank and 125,748 Genome Aggregation Database (gnomAD) participants across multiple ancestries to test several key intolerance metrics, including the Residual Variance Intolerance Score (RVIS), Missense Tolerance Ratio (MTR), and Loss-of-Function Observed/Expected ratio (LOF O/E). We demonstrate that increasing ancestral representation, rather than sample size alone, critically drives their performance. Scores trained on variation observed in African and Admixed American ancestral groups show higher resolution in detecting haploinsufficient and neurodevelopmental disease risk genes compared to scores trained on European ancestry groups. Most strikingly, MTR trained on 43,000 multi-ancestry exomes demonstrates greater predictive power than when trained on a nearly 10-fold larger dataset of 440,000 non-Finnish European exomes. We further find that European ancestry group-based scores are likely approaching saturation. These findings highlight the need for enhanced population representation in genomic resources to fully realize the potential of precision medicine and drug discovery. Ancestry group-specific scores are publicly available through an interactive portal: http://intolerance.public.cgr.astrazeneca.com/ .
Suggested Citation
Alexander L. Han & Chloe F. Sands & Dorota Matelska & Jessica C. Butts & Vida Ravanmehr & Fengyuan Hu & Esmeralda Villavicencio Gonzalez & Nicholas Katsanis & Carlos D. Bustamante & Quanli Wang & Slav, 2025.
"Diverse ancestral representation improves genetic intolerance metrics,"
Nature Communications, Nature, vol. 16(1), pages 1-9, December.
Handle:
RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-57885-5
DOI: 10.1038/s41467-025-57885-5
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-57885-5. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.