IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v7y2008i1n35.html
   My bibliography  Save this article

A Sparse PLS for Variable Selection when Integrating Omics Data

Author

Listed:
  • Lê Cao Kim-Anh

    (INRA UR 631 and Université de Toulouse)

  • Rossouw Debra

    (University of Stellenbosch)

  • Robert-Granié Christèle

    (INRA UR 631)

  • Besse Philippe

    (Université de Toulouse)

Abstract

Recent biotechnology advances allow for multiple types of omics data, such as transcriptomic, proteomic or metabolomic data sets to be integrated. The problem of feature selection has been addressed several times in the context of classification, but needs to be handled in a specific manner when integrating data. In this study, we focus on the integration of two-block data that are measured on the same samples. Our goal is to combine integration and simultaneous variable selection of the two data sets in a one-step procedure using a Partial Least Squares regression (PLS) variant to facilitate the biologists' interpretation. A novel computational methodology called ``sparse PLS" is introduced for a predictive analysis to deal with these newly arisen problems. The sparsity of our approach is achieved with a Lasso penalization of the PLS loading vectors when computing the Singular Value Decomposition.Sparse PLS is shown to be effective and biologically meaningful. Comparisons with classical PLS are performed on a simulated data set and on real data sets. On one data set, a thorough biological interpretation of the obtained results is provided. We show that sparse PLS provides a valuable variable selection tool for highly dimensional data sets.

Suggested Citation

  • Lê Cao Kim-Anh & Rossouw Debra & Robert-Granié Christèle & Besse Philippe, 2008. "A Sparse PLS for Variable Selection when Integrating Omics Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-32, November.
  • Handle: RePEc:bpj:sagmbi:v:7:y:2008:i:1:n:35
    DOI: 10.2202/1544-6115.1390
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1390
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1390?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Shen, Haipeng & Huang, Jianhua Z., 2008. "Sparse principal component analysis via regularized low rank matrix approximation," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1015-1034, July.
    2. Waaijenborg Sandra & Verselewel de Witt Hamer Philip C. & Zwinderman Aeilko H, 2008. "Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-29, January.
    3. Mevik, Björn-Helge & Wehrens, Ron, 2007. "The pls Package: Principal Component and Partial Least Squares Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 18(i02).
    4. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    5. Lê Cao Kim-Anh & Gonçalves Olivier & Besse Philippe & Gadat Sébastien, 2007. "Selection of Biologically Relevant Genes with a Wrapper Stochastic Algorithm," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 6(1), pages 1-23, November.
    6. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    7. Boulesteix Anne-Laure, 2004. "PLS Dimension Reduction for Classification with Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-32, November.
    8. Trendafilov, Nickolay T. & Jolliffe, Ian T., 2006. "Projected gradient approach to the numerical solution of the SCoTLASS," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 242-253, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Paul Bergeron & Morgane Dos Santos & Lisa Sitterle & Georges Tarlet & Jeremy Lavigne & Winchygn Liu & Marine Gerbé de Thoré & Céline Clémenson & Lydia Meziani & Cathyanne Schott & Giulia Mazzaschi & K, 2024. "Non-homogenous intratumor ionizing radiation doses synergize with PD1 and CXCR2 blockade," Nature Communications, Nature, vol. 15(1), pages 1-19, December.
    2. Michael Gutkin & Ron Shamir & Gideon Dror, 2009. "SlimPLS: A Method for Feature Selection in Gene Expression-Based Disease Classification," PLOS ONE, Public Library of Science, vol. 4(7), pages 1-12, July.
    3. Cemal Erdem & Sean M. Gross & Laura M. Heiser & Marc R. Birtwistle, 2023. "MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    4. Perrin, Augustine & Cristobal, Magali San & Milestad, Rebecka & Martin, Guillaume, 2020. "Identification of resilience factors of organic dairy cattle farms," Agricultural Systems, Elsevier, vol. 183(C).
    5. Feuerriegel, Stefan & Gordon, Julius, 2019. "News-based forecasts of macroeconomic indicators: A semantic path model for interpretable predictions," European Journal of Operational Research, Elsevier, vol. 272(1), pages 162-175.
    6. Minji Lee & Zhihua Su, 2020. "A Review of Envelope Models," International Statistical Review, International Statistical Institute, vol. 88(3), pages 658-676, December.
    7. Hernandez Roig, Harold Antonio & Aguilera Morillo, María del Carmen & Aguilera, Ana M. & Preda, Cristian, 2023. "Penalized function-on-function partial leastsquares regression," DES - Working Papers. Statistics and Econometrics. WS 37758, Universidad Carlos III de Madrid. Departamento de Estadística.
    8. Jain Yashita & Ding Shanshan & Qiu Jing, 2019. "Sliced inverse regression for integrative multi-omics data analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(1), pages 1-13, February.
    9. Xavier Bry & Ndèye Niang & Thomas Verron & Stéphanie Bougeard, 2023. "Clusterwise elastic-net regression based on a combined information criterion," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 75-107, March.
    10. Daniele, Bertolozzi-Caredio & Barbara, Soriano & Isabel, Bardaji & Alberto, Garrido, 2022. "Analysis of perceived robustness, adaptability and transformability of Spanish extensive livestock farms under alternative challenging scenarios," Agricultural Systems, Elsevier, vol. 202(C).
    11. Zhang Fan & Miecznikowski Jeffrey C. & Tritchler David L., 2020. "Identification of supervised and sparse functional genomic pathways," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 19(1), pages 1-27, February.
    12. Marc Schoeler & Sandrine Ellero-Simatos & Till Birkner & Jordi Mayneris-Perxachs & Lisa Olsson & Harald Brolin & Ulrike Loeber & Jamie D. Kraft & Arnaud Polizzi & Marian Martí-Navas & Josep Puig & Ant, 2023. "The interplay between dietary fatty acids and gut microbiota influences host metabolism and hepatic steatosis," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    13. Marttinen Pekka & Gillberg Jussi & Havulinna Aki & Corander Jukka & Kaski Samuel, 2013. "Genome-wide association studies with high-dimensional phenotypes," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 12(4), pages 413-431, August.
    14. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    15. Chung Dongjun & Keles Sunduz, 2010. "Sparse Partial Least Squares Classification for High Dimensional Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-32, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lykou, Anastasia & Whittaker, Joe, 2010. "Sparse CCA using a Lasso with positivity constraints," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3144-3157, December.
    2. Thomas Despois & Catherine Doz, 2022. "Identifying and interpreting the factors in factor models via sparsity : Different approaches," Working Papers halshs-03626503, HAL.
    3. Thomas Despois & Catherine Doz, 2023. "Identifying and interpreting the factors in factor models via sparsity: Different approaches," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(4), pages 533-555, June.
    4. Bennett, Donyetta & Mekelburg, Erik & Strauss, Jack & Williams, T.H., 2024. "Unlocking the black box of sentiment and cryptocurrency: What, which, why, when and how?," Global Finance Journal, Elsevier, vol. 60(C).
    5. Mihee Lee & Haipeng Shen & Jianhua Z. Huang & J. S. Marron, 2010. "Biclustering via Sparse Singular Value Decomposition," Biometrics, The International Biometric Society, vol. 66(4), pages 1087-1095, December.
    6. Tomasz Rymarczyk & Krzysztof Król & Edward Kozłowski & Tomasz Wołowiec & Marta Cholewa-Wiktor & Piotr Bednarczuk, 2021. "Application of Electrical Tomography Imaging Using Machine Learning Methods for the Monitoring of Flood Embankments Leaks," Energies, MDPI, vol. 14(23), pages 1-35, December.
    7. Hyun Hak Kim, 2013. "Forecasting Macroeconomic Variables Using Data Dimension Reduction Methods: The Case of Korea," Working Papers 2013-26, Economic Research Institute, Bank of Korea.
    8. Amir Beck & Yakov Vaisbourd, 2016. "The Sparse Principal Component Analysis Problem: Optimality Conditions and Algorithms," Journal of Optimization Theory and Applications, Springer, vol. 170(1), pages 119-143, July.
    9. Nerea González-García & Ana Belén Nieto-Librero & Purificación Galindo-Villardón, 2023. "CenetBiplot: a new proposal of sparse and orthogonal biplots methods by means of elastic net CSVD," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 5-19, March.
    10. Tanin Sirimongkolkasem & Reza Drikvandi, 2019. "On Regularisation Methods for Analysis of High Dimensional Data," Annals of Data Science, Springer, vol. 6(4), pages 737-763, December.
    11. Chalise, Prabhakar & Fridley, Brooke L., 2012. "Comparison of penalty functions for sparse canonical correlation analysis," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 245-254.
    12. Fitzpatrick, Trevor & Mues, Christophe, 2021. "How can lenders prosper? Comparing machine learning approaches to identify profitable peer-to-peer loan investments," European Journal of Operational Research, Elsevier, vol. 294(2), pages 711-722.
    13. Shuichi Kawano, 2021. "Sparse principal component regression via singular value decomposition approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(3), pages 795-823, September.
    14. Davood Hajinezhad & Qingjiang Shi, 2018. "Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications," Journal of Global Optimization, Springer, vol. 70(1), pages 261-288, January.
    15. Juan C. Laria & M. Carmen Aguilera-Morillo & Rosa E. Lillo, 2023. "Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models," Statistical Papers, Springer, vol. 64(1), pages 227-253, February.
    16. Rosember Guerra-Urzola & Katrijn Van Deun & Juan C. Vera & Klaas Sijtsma, 2021. "A Guide for Sparse PCA: Model Comparison and Applications," Psychometrika, Springer;The Psychometric Society, vol. 86(4), pages 893-919, December.
    17. Thomas Despois & Catherine Doz, 2022. "Identifying and interpreting the factors in factor models via sparsity : Different approaches," PSE Working Papers halshs-03626503, HAL.
    18. Lee, Seokho & Huang, Jianhua Z., 2013. "A coordinate descent MM algorithm for fast computation of sparse logistic PCA," Computational Statistics & Data Analysis, Elsevier, vol. 62(C), pages 26-38.
    19. Michael Greenacre & Patrick J. F Groenen & Trevor Hastie & Alfonso Iodice d’Enza & Angelos Markos & Elena Tuzhilina, 2023. "Principal component analysis," Economics Working Papers 1856, Department of Economics and Business, Universitat Pompeu Fabra.
    20. Rosember Guerra-Urzola & Niek C. Schipper & Anya Tonne & Klaas Sijtsma & Juan C. Vera & Katrijn Deun, 2023. "Sparsifying the least-squares approach to PCA: comparison of lasso and cardinality constraint," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 269-286, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:7:y:2008:i:1:n:35. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.