Author
Listed:
- Silvia Pineda
- Francisco X Real
- Manolis Kogevinas
- Alfredo Carrato
- Stephen J Chanock
- Núria Malats
- Kristel Van Steen
Abstract
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions.Author Summary: At present, it is already possible to generate different type of omics–high throughput–data in the same individuals. However, we lack methodology to adequately combine them. Many challenges arise while the amount of data increases and we need to find the way to identify and understand the complex relationships when integrating data. In this regard, new statistical approaches are needed, such as the ones we propose and apply here to integrate three types of omics data (genomics, epigenomics, and transcriptomics) generated using bladder cancer tumor samples. These innovative approaches (LASSO and ENET combined with a permutation-based MaxT method) allowed us to find 48 genes whose expression levels were significantly associated with genomics and epigenomics markers. The adequacy of this approach was confirmed by the use of an independent data set from The Cancer Genome Atlas Consortium: 75% of the genes were replicated. Previous sound biological evidences further support the results obtained.
Suggested Citation
Silvia Pineda & Francisco X Real & Manolis Kogevinas & Alfredo Carrato & Stephen J Chanock & Núria Malats & Kristel Van Steen, 2015.
"Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer,"
PLOS Genetics, Public Library of Science, vol. 11(12), pages 1-22, December.
Handle:
RePEc:plo:pgen00:1005689
DOI: 10.1371/journal.pgen.1005689
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1005689. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.