IDEAS home Printed from https://ideas.repec.org/a/nat/nature/v631y2024i8021d10.1038_s41586-024-07556-0.html
   My bibliography  Save this article

A deep catalogue of protein-coding variation in 983,578 individuals

Author

Listed:
  • Kathie Y. Sun

    (Regeneron Genetics Center)

  • Xiaodong Bai

    (Regeneron Genetics Center)

  • Siying Chen

    (Regeneron Genetics Center)

  • Suying Bao

    (Regeneron Genetics Center)

  • Chuanyi Zhang

    (Regeneron Genetics Center)

  • Manav Kapoor

    (Regeneron Genetics Center)

  • Joshua Backman

    (Regeneron Genetics Center)

  • Tyler Joseph

    (Regeneron Genetics Center)

  • Evan Maxwell

    (Regeneron Genetics Center)

  • George Mitra

    (Regeneron Genetics Center)

  • Alexander Gorovits

    (Regeneron Genetics Center)

  • Adam Mansfield

    (Regeneron Genetics Center)

  • Boris Boutkov

    (Regeneron Genetics Center)

  • Sujit Gokhale

    (Regeneron Genetics Center)

  • Lukas Habegger

    (Regeneron Genetics Center)

  • Anthony Marcketta

    (Regeneron Genetics Center)

  • Adam E. Locke

    (Regeneron Genetics Center)

  • Liron Ganel

    (Regeneron Genetics Center)

  • Alicia Hawes

    (Regeneron Genetics Center)

  • Michael D. Kessler

    (Regeneron Genetics Center)

  • Deepika Sharma

    (Regeneron Genetics Center)

  • Jeffrey Staples

    (Regeneron Genetics Center)

  • Jonas Bovijn

    (Regeneron Genetics Center)

  • Sahar Gelfman

    (Regeneron Genetics Center)

  • Alessandro Gioia

    (Regeneron Genetics Center)

  • Veera M. Rajagopal

    (Regeneron Genetics Center)

  • Alexander Lopez

    (Regeneron Genetics Center)

  • Jennifer Rico Varela

    (Regeneron Genetics Center)

  • Jesús Alegre-Díaz

    (National Autonomous University of Mexico (UNAM))

  • Jaime Berumen

    (National Autonomous University of Mexico (UNAM))

  • Roberto Tapia-Conyer

    (National Autonomous University of Mexico (UNAM))

  • Pablo Kuri-Morales

    (National Autonomous University of Mexico (UNAM)
    Instituto Tecnológico y de Estudios Superiores de Monterrey)

  • Jason Torres

    (University of Oxford)

  • Jonathan Emberson

    (University of Oxford)

  • Rory Collins

    (University of Oxford)

  • Michael Cantor

    (Regeneron Genetics Center)

  • Timothy Thornton

    (Regeneron Genetics Center)

  • Hyun Min Kang

    (Regeneron Genetics Center)

  • John D. Overton

    (Regeneron Genetics Center)

  • Alan R. Shuldiner

    (Regeneron Genetics Center)

  • M. Laura Cremona

    (Regeneron Genetics Center)

  • Mona Nafde

    (Regeneron Genetics Center)

  • Aris Baras

    (Regeneron Genetics Center)

  • Gonçalo Abecasis

    (Regeneron Genetics Center)

  • Jonathan Marchini

    (Regeneron Genetics Center)

  • Jeffrey G. Reid

    (Regeneron Genetics Center)

  • William Salerno

    (Regeneron Genetics Center)

  • Suganthi Balasubramanian

    (Regeneron Genetics Center)

Abstract

Rare coding variants that substantially affect function provide insights into the biology of a gene1–3. However, ascertaining the frequency of such variants requires large sample sizes4–8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.

Suggested Citation

  • Kathie Y. Sun & Xiaodong Bai & Siying Chen & Suying Bao & Chuanyi Zhang & Manav Kapoor & Joshua Backman & Tyler Joseph & Evan Maxwell & George Mitra & Alexander Gorovits & Adam Mansfield & Boris Boutk, 2024. "A deep catalogue of protein-coding variation in 983,578 individuals," Nature, Nature, vol. 631(8021), pages 583-592, July.
  • Handle: RePEc:nat:nature:v:631:y:2024:i:8021:d:10.1038_s41586-024-07556-0
    DOI: 10.1038/s41586-024-07556-0
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41586-024-07556-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1038/s41586-024-07556-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:nature:v:631:y:2024:i:8021:d:10.1038_s41586-024-07556-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.