Author
Listed:
- Zhi Wei
- Kai Wang
- Hui-Qi Qu
- Haitao Zhang
- Jonathan Bradfield
- Cecilia Kim
- Edward Frackleton
- Cuiping Hou
- Joseph T Glessner
- Rosetta Chiavacci
- Charles Stanley
- Dimitri Monos
- Struan F A Grant
- Constantin Polychronakos
- Hakon Hakonarson
Abstract
Genome-wide association studies (GWAS) have been fruitful in identifying disease susceptibility loci for common and complex diseases. A remaining question is whether we can quantify individual disease risk based on genotype data, in order to facilitate personalized prevention and treatment for complex diseases. Previous studies have typically failed to achieve satisfactory performance, primarily due to the use of only a limited number of confirmed susceptibility loci. Here we propose that sophisticated machine-learning approaches with a large ensemble of markers may improve the performance of disease risk assessment. We applied a Support Vector Machine (SVM) algorithm on a GWAS dataset generated on the Affymetrix genotyping platform for type 1 diabetes (T1D) and optimized a risk assessment model with hundreds of markers. We subsequently tested this model on an independent Illumina-genotyped dataset with imputed genotypes (1,008 cases and 1,000 controls), as well as a separate Affymetrix-genotyped dataset (1,529 cases and 1,458 controls), resulting in area under ROC curve (AUC) of ∼0.84 in both datasets. In contrast, poor performance was achieved when limited to dozens of known susceptibility loci in the SVM model or logistic regression model. Our study suggests that improved disease risk assessment can be achieved by using algorithms that take into account interactions between a large ensemble of markers. We are optimistic that genotype-based disease risk assessment may be feasible for diseases where a notable proportion of the risk has already been captured by SNP arrays.Author Summary: An often touted utility of genome-wide association studies (GWAS) is that the resulting discoveries can facilitate implementation of personalized medicine, in which preventive and therapeutic interventions for complex diseases can be tailored to individual genetic profiles. However, recent studies using whole-genome SNP genotype data for disease risk assessment have generally failed to achieve satisfactory results, leading to a pessimistic view of the utility of genotype data for such purposes. Here we propose that sophisticated machine-learning approaches on a large ensemble of markers, which contain both confirmed and as yet unconfirmed disease susceptibility variants, may improve the performance of disease risk assessment. We tested an algorithm called Support Vector Machine (SVM) on three large-scale datasets for type 1 diabetes and demonstrated that risk assessment can be highly accurate for the disease. Our results suggest that individualized disease risk assessment using whole-genome data may be more successful for some diseases (such as T1D) than other diseases. However, the predictive accuracy will be dependent on the heritability of the disease under study, the proportion of the genetic risk that is known, and that the right set of markers and right algorithms are being used.
Suggested Citation
Zhi Wei & Kai Wang & Hui-Qi Qu & Haitao Zhang & Jonathan Bradfield & Cecilia Kim & Edward Frackleton & Cuiping Hou & Joseph T Glessner & Rosetta Chiavacci & Charles Stanley & Dimitri Monos & Struan F , 2009.
"From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes,"
PLOS Genetics, Public Library of Science, vol. 5(10), pages 1-11, October.
Handle:
RePEc:plo:pgen00:1000678
DOI: 10.1371/journal.pgen.1000678
Download full text from publisher
Citations
Citations are extracted by the
CitEc Project, subscribe to its
RSS feed for this item.
Cited by:
- Ohad Manor & Eran Segal, 2013.
"Predicting Disease Risk Using Bootstrap Ranking and Classification Algorithms,"
PLOS Computational Biology, Public Library of Science, vol. 9(8), pages 1-10, August.
- Tianle Chen & Yuanjia Wang & Huaihou Chen & Karen Marder & Donglin Zeng, 2014.
"Targeted Local Support Vector Machine for Age-Dependent Classification,"
Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1174-1187, September.
- Sebastian Okser & Tapio Pahikkala & Antti Airola & Tapio Salakoski & Samuli Ripatti & Tero Aittokallio, 2014.
"Regularized Machine Learning in the Genetic Prediction of Complex Traits,"
PLOS Genetics, Public Library of Science, vol. 10(11), pages 1-9, November.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1000678. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.