Author
Listed:
- James Zou
(Stanford University)
- Gregory Valiant
(Stanford University)
- Paul Valiant
(Brown University)
- Konrad Karczewski
(Analytic and Translational Genetics Unit, Massachusetts General Hospital
Broad Institute or MIT and Harvard)
- Siu On Chan
(Computer Science and Engineering, Chinese University of Hong Kong)
- Kaitlin Samocha
(Analytic and Translational Genetics Unit, Massachusetts General Hospital
Broad Institute or MIT and Harvard)
- Monkol Lek
(Analytic and Translational Genetics Unit, Massachusetts General Hospital
Broad Institute or MIT and Harvard)
- Shamil Sunyaev
(Broad Institute or MIT and Harvard
Brigham and Women’s Hospital, Harvard Medical School)
- Mark Daly
(Analytic and Translational Genetics Unit, Massachusetts General Hospital
Broad Institute or MIT and Harvard
Harvard Medical School)
- Daniel G. MacArthur
(Analytic and Translational Genetics Unit, Massachusetts General Hospital
Broad Institute or MIT and Harvard
Harvard Medical School)
Abstract
As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of
Suggested Citation
James Zou & Gregory Valiant & Paul Valiant & Konrad Karczewski & Siu On Chan & Kaitlin Samocha & Monkol Lek & Shamil Sunyaev & Mark Daly & Daniel G. MacArthur, 2016.
"Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects,"
Nature Communications, Nature, vol. 7(1), pages 1-5, December.
Handle:
RePEc:nat:natcom:v:7:y:2016:i:1:d:10.1038_ncomms13293
DOI: 10.1038/ncomms13293
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:7:y:2016:i:1:d:10.1038_ncomms13293. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.