IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003200.html
   My bibliography  Save this article

Predicting Disease Risk Using Bootstrap Ranking and Classification Algorithms

Author

Listed:
  • Ohad Manor
  • Eran Segal

Abstract

Genome-wide association studies (GWAS) are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a “black box” in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction typically rank single nucleotide polymorphisms (SNPs) by the p-value of their association with the disease, and use the top-associated SNPs as input to a classification algorithm. However, the predictive power of such methods is relatively poor. To improve the predictive power, we devised BootRank, which uses bootstrapping in order to obtain a robust prioritization of SNPs for use in predictive models. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC) data and results in a more robust set of SNPs and a larger number of enriched pathways being associated with the different diseases. Finally, we show that combining BootRank with seven different classification algorithms improves performance compared to previous studies that used the WTCCC data. Notably, diseases for which BootRank results in the largest improvements were recently shown to have more heritability than previously thought, likely due to contributions from variants with low minimum allele frequency (MAF), suggesting that BootRank can be beneficial in cases where SNPs affecting the disease are poorly tagged or have low MAF. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.Author Summary: Genome-wide association studies are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a “black box” in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction have relatively poor performance, with one possible explanation being the fact they rely on a noisy ranking of genetic variants given to them as input. To improve the predictive power, we devised BootRank, a ranking method less sensitive to noise. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC) data, and that combining BootRank with different classification algorithms improves performance compared to previous studies that used these data. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.

Suggested Citation

  • Ohad Manor & Eran Segal, 2013. "Predicting Disease Risk Using Bootstrap Ranking and Classification Algorithms," PLOS Computational Biology, Public Library of Science, vol. 9(8), pages 1-10, August.
  • Handle: RePEc:plo:pcbi00:1003200
    DOI: 10.1371/journal.pcbi.1003200
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003200
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003200&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003200?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Zhi Wei & Kai Wang & Hui-Qi Qu & Haitao Zhang & Jonathan Bradfield & Cecilia Kim & Edward Frackleton & Cuiping Hou & Joseph T Glessner & Rosetta Chiavacci & Charles Stanley & Dimitri Monos & Struan F , 2009. "From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes," PLOS Genetics, Public Library of Science, vol. 5(10), pages 1-11, October.
    2. Eric S. Lander, 2011. "Initial impact of the sequencing of the human genome," Nature, Nature, vol. 470(7333), pages 187-197, February.
    3. Hariklia Eleftherohorinou & Victoria Wright & Clive Hoggart & Anna-Liisa Hartikainen & Marjo-Riitta Jarvelin & David Balding & Lachlan Coin & Michael Levin, 2009. "Pathway Analysis of GWAS Provides New Insights into Genetic Susceptibility to 3 Inflammatory Diseases," PLOS ONE, Public Library of Science, vol. 4(11), pages 1-11, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hannes Rothe & Katharina Barbara Lauer & Callum Talbot-Cooper & Daniel Juan Sivizaca Conde, 2023. "Digital entrepreneurship from cellular data: How omics afford the emergence of a new wave of digital ventures in health," Electronic Markets, Springer;IIM University of St. Gallen, vol. 33(1), pages 1-17, December.
    2. Ren-Hua Chung & Ying-Erh Chen, 2012. "A Two-Stage Random Forest-Based Pathway Analysis Method," PLOS ONE, Public Library of Science, vol. 7(5), pages 1-6, May.
    3. Tianle Chen & Yuanjia Wang & Huaihou Chen & Karen Marder & Donglin Zeng, 2014. "Targeted Local Support Vector Machine for Age-Dependent Classification," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1174-1187, September.
    4. Manuel Hermosilla & Jorge Lemus, 2018. "Therapeutic Translation of Genomic Science: Opportunities and Limitations of GWAS," NBER Chapters, in: Economic Dimensions of Personalized and Precision Medicine, pages 21-52, National Bureau of Economic Research, Inc.
    5. Silver Matt & Montana Giovanni & Alzheimer's Disease Neuroimaging Initiative, 2012. "Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(1), pages 1-43, January.
    6. Nagel, Mats, 2020. "Changing perspectives: Towards detailed phenotyping in genetics," Thesis Commons a4nz2, Center for Open Science.
    7. Yvonne J K Edwards & Gary W Beecham & William K Scott & Sawsan Khuri & Guney Bademci & Demet Tekin & Eden R Martin & Zhijie Jiang & Deborah C Mash & Jarlath ffrench-Mullen & Margaret A Pericak-Vance &, 2011. "Identifying Consensus Disease Pathways in Parkinson's Disease Using an Integrative Systems Biology Approach," PLOS ONE, Public Library of Science, vol. 6(2), pages 1-11, February.
    8. Iyn-Hyang Lee & Hye-Young Kang & Hae Sun Suh & Sukhyang Lee & Eun Sil Oh & Hotcherl Jeong, 2018. "Awareness and attitude of the public toward personalized medicine in Korea," PLOS ONE, Public Library of Science, vol. 13(2), pages 1-14, February.
    9. Sebastian Okser & Tapio Pahikkala & Antti Airola & Tapio Salakoski & Samuli Ripatti & Tero Aittokallio, 2014. "Regularized Machine Learning in the Genetic Prediction of Complex Traits," PLOS Genetics, Public Library of Science, vol. 10(11), pages 1-9, November.
    10. Zhang Tian-Xiao & Beaty Terri H. & Ruczinski Ingo, 2012. "Candidate Pathway Based Analysis for Cleft Lip with or without Cleft Palate," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-21, January.
    11. Xianxian Yang & Bin Tan & Xipeng Zhou & Jian Xue & Xian Zhang & Peng Wang & Chuang Shao & Yingli Li & Chaorui Li & Huiming Xia & Jingfu Qiu, 2015. "Interferon-Inducible Transmembrane Protein 3 Genetic Variant rs12252 and Influenza Susceptibility and Severity: A Meta-Analysis," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-14, May.
    12. Ayellet V Segrè & DIAGRAM Consortium & MAGIC investigators & Leif Groop & Vamsi K Mootha & Mark J Daly & David Altshuler, 2010. "Common Inherited Variation in Mitochondrial Genes Is Not Enriched for Associations with Type 2 Diabetes or Related Glycemic Traits," PLOS Genetics, Public Library of Science, vol. 6(8), pages 1-19, August.
    13. Nerea Bartolomé & Sergi Segarra & Marta Artieda & Olga Francino & Elisenda Sánchez & Magdalena Szczypiorska & Joaquim Casellas & Diego Tejedor & Joaquín Cerdeira & Antonio Martínez & Alfonso Velasco &, 2015. "A Genetic Predictive Model for Canine Hip Dysplasia: Integration of Genome Wide Association Study (GWAS) and Candidate Gene Approaches," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-13, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003200. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.