IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0192853.html
   My bibliography  Save this article

Improving stability of prediction models based on correlated omics data by using network approaches

Author

Listed:
  • Renaud Tissier
  • Jeanine Houwing-Duistermaat
  • Mar Rodríguez-Girondo

Abstract

Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.

Suggested Citation

  • Renaud Tissier & Jeanine Houwing-Duistermaat & Mar Rodríguez-Girondo, 2018. "Improving stability of prediction models based on correlated omics data by using network approaches," PLOS ONE, Public Library of Science, vol. 13(2), pages 1-23, February.
  • Handle: RePEc:plo:pone00:0192853
    DOI: 10.1371/journal.pone.0192853
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0192853
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0192853&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0192853?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Zhang Bin & Horvath Steve, 2005. "A General Framework for Weighted Gene Co-Expression Network Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-45, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Antonella Iuliano & Annalisa Occhipinti & Claudia Angelini & Italia De Feis & Pietro Liò, 2021. "COSMONET: An R Package for Survival Analysis Using Screening-Network Methods," Mathematics, MDPI, vol. 9(24), pages 1-25, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yixuan Qiu & Jing Lei & Kathryn Roeder, 2023. "Gradient-based sparse principal component analysis with extensions to online learning," Biometrika, Biometrika Trust, vol. 110(2), pages 339-360.
    2. Ruiz Vargas, E. & Mitchell, D.G.V. & Greening, S.G. & Wahl, L.M., 2014. "Topology of whole-brain functional MRI networks: Improving the truncated scale-free model," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 405(C), pages 151-158.
    3. Yan Guo & Hui Yu & Haocan Song & Jiapeng He & Olufunmilola Oyebamiji & Huining Kang & Jie Ping & Scott Ness & Yu Shyr & Fei Ye, 2021. "MetaGSCA: A tool for meta-analysis of gene set differential coexpression," PLOS Computational Biology, Public Library of Science, vol. 17(5), pages 1-15, May.
    4. Xue Jiang & Han Zhang & Xiongwen Quan & Zhandong Liu & Yanbin Yin, 2017. "Disease-related gene module detection based on a multi-label propagation clustering algorithm," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-17, May.
    5. Mandel, Antoine & Landini, Simone & Gallegati, Mauro & Gintis, Herbert, 2015. "Price dynamics, financial fragility and aggregate volatility," Journal of Economic Dynamics and Control, Elsevier, vol. 51(C), pages 257-277.
    6. Peter Langfelder & Rui Luo & Michael C Oldham & Steve Horvath, 2011. "Is My Network Module Preserved and Reproducible?," PLOS Computational Biology, Public Library of Science, vol. 7(1), pages 1-29, January.
    7. Elva María Novoa-del-Toro & Efrén Mezura-Montes & Matthieu Vignes & Morgane Térézol & Frédérique Magdinier & Laurent Tichit & Anaïs Baudot, 2021. "A multi-objective genetic algorithm to find active modules in multiplex biological networks," PLOS Computational Biology, Public Library of Science, vol. 17(8), pages 1-24, August.
    8. Matias Nehuen Iglesias, 2021. "The Overlooked Insights from Correlation Structures in Economic Geography," Papers in Evolutionary Economic Geography (PEEG) 2105, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Jan 2021.
    9. Lingxue Zhang & Seyoung Kim, 2014. "Learning Gene Networks under SNP Perturbations Using eQTL Datasets," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-20, February.
    10. Benjamin A Samuels & E David Leonardo & Alex Dranovsky & Amanda Williams & Erik Wong & Addie May I Nesbitt & Richard D McCurdy & Rene Hen & Mark Alter, 2014. "Global State Measures of the Dentate Gyrus Gene Expression System Predict Antidepressant-Sensitive Behaviors," PLOS ONE, Public Library of Science, vol. 9(1), pages 1-10, January.
    11. Tingting Bo & Jie Li & Ganlu Hu & Ge Zhang & Wei Wang & Qian Lv & Shaoling Zhao & Junjie Ma & Meng Qin & Xiaohui Yao & Meiyun Wang & Guang-Zhong Wang & Zheng Wang, 2023. "Brain-wide and cell-specific transcriptomic insights into MRI-derived cortical morphology in macaque monkeys," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    12. Chang Su & Zichun Xu & Xinning Shan & Biao Cai & Hongyu Zhao & Jingfei Zhang, 2023. "Cell-type-specific co-expression inference from single cell RNA-sequencing data," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    13. Sahra Uygun & Cheng Peng & Melissa D Lehti-Shiu & Robert L Last & Shin-Han Shiu, 2016. "Utility and Limitations of Using Gene Expression Data to Identify Functional Associations," PLOS Computational Biology, Public Library of Science, vol. 12(12), pages 1-27, December.
    14. Li, Jie & Wang, Lidan & Zhou, Zhong-Qiang & Zhang, Yongjie, 2021. "Monitoring or tunneling? Information interaction among large shareholders and the crash risk of the stock price," Pacific-Basin Finance Journal, Elsevier, vol. 65(C).
    15. Khang Tsung Fei & Yap Von Bing, 2010. "The Apportionment of Total Genetic Variation by Categorical Analysis of Variance," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-34, January.
    16. Shaoshuo Li & Baixing Chen & Hao Chen & Zhen Hua & Yang Shao & Heng Yin & Jianwei Wang, 2021. "Analysis of potential genetic biomarkers and molecular mechanism of smoking-related postmenopausal osteoporosis using weighted gene co-expression network analysis and machine learning," PLOS ONE, Public Library of Science, vol. 16(9), pages 1-18, September.
    17. Peter Langfelder & Fuying Gao & Nan Wang & David Howland & Seung Kwak & Thomas F Vogt & Jeffrey S Aaronson & Jim Rosinski & Giovanni Coppola & Steve Horvath & X William Yang, 2018. "MicroRNA signatures of endogenous Huntingtin CAG repeat expansion in mice," PLOS ONE, Public Library of Science, vol. 13(1), pages 1-20, January.
    18. Shujuan Zhao & Kedous Y. Mekbib & Martijn A. Ent & Garrett Allington & Andrew Prendergast & Jocelyn E. Chau & Hannah Smith & John Shohfi & Jack Ocken & Daniel Duran & Charuta G. Furey & Le Thi Hao & P, 2023. "Mutation of key signaling regulators of cerebrovascular development in vein of Galen malformations," Nature Communications, Nature, vol. 14(1), pages 1-23, December.
    19. Wang, Tao & Xiao, Shiying & Yan, Jun & Zhang, Panpan, 2021. "Regional and sectoral structures of the Chinese economy: A network perspective from multi-regional input–output tables," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 581(C).
    20. Belik, Ivan & Knudsen, Eirik Sjåholm, 2023. "Link on, Link off: Data-driven management of organizational networks for ambidexterity," Journal of Business Research, Elsevier, vol. 157(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0192853. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.