IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v10y2011i1n14.html
   My bibliography  Save this article

Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient

Author

Listed:
  • Mayer Claus-Dieter
  • Lorent Julie
  • Horgan Graham W

Abstract

The integration of multiple high-dimensional data sets (omics data) has been a very active but challenging area of bioinformatics research in recent years. Various adaptations of non-standard multivariate statistical tools have been suggested that allow to analyze and visualize such data sets simultaneously. However, these methods typically can deal with two data sets only, whereas systems biology experiments often generate larger numbers of high-dimensional data sets. For this reason, we suggest an explorative analysis of similarity between data sets as an initial analysis steps. This analysis is based on the RV coefficient, a matrix correlation, that can be interpreted as a generalization of the squared correlation from two single variables to two sets of variables. It has been shown before however that the high-dimensionality of the data introduces substantial bias to the RV.We therefore introduce an alternative version, the adjusted RV, which is unbiased in the case of independent data sets. We can also show that in many situations, particularly for very high-dimensional data sets, the adjusted RV is a better estimator than previously RV versions in terms of the mean square error and the power of the independence test based on it.We demonstrate the usefulness of the adjusted RV by applying it to data set of 19 different multivariate data sets from a systems biology experiment. The pairwise RV values between the data sets define a similarity matrix that we can use as an input to a hierarchical clustering or a multi-dimensional scaling. We show that this reveals biological meaningful subgroups of data sets in our study.

Suggested Citation

  • Mayer Claus-Dieter & Lorent Julie & Horgan Graham W, 2011. "Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-27, March.
  • Handle: RePEc:bpj:sagmbi:v:10:y:2011:i:1:n:14
    DOI: 10.2202/1544-6115.1540
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1540
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1540?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Robert, P. & Cleroux, R. & Ranger, N., 1985. "Some results on vector correlation," Computational Statistics & Data Analysis, Elsevier, vol. 3(1), pages 25-32, May.
    2. Kazi-Aoual, Frederique & Hitier, Simon & Sabatier, Robert & Lebreton, Jean-Dominique, 1995. "Refined approximations to permutation tests for multivariate inference," Computational Statistics & Data Analysis, Elsevier, vol. 20(6), pages 643-656, December.
    3. Josse, J. & Pagès, J. & Husson, F., 2008. "Testing the significance of the RV coefficient," Computational Statistics & Data Analysis, Elsevier, vol. 53(1), pages 82-91, September.
    4. Ledyard Tucker, 1958. "An inter-battery method of factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 23(2), pages 111-136, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kevin W. Zhu & Shawn D. Burton & Maira H. Nagai & Justin D. Silverman & Claire A. March & Matt Wachowiak & Hiroaki Matsunami, 2022. "Decoding the olfactory map through targeted transcriptomics links murine olfactory receptors to glomeruli," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    2. Ilaria Lucrezia Amerise & Agostino Tarsitano, 2012. "Weighting Distance Matrices Using Rank Correlations," Working Papers 201209, Università della Calabria, Dipartimento di Economia, Statistica e Finanza "Giovanni Anania" - DESF.
    3. Papageorgiou, Ioulia & Moustaki, Irini, 2019. "Sampling of pairs in pairwise likelihood estimation for latent variable models with categorical observed variables," LSE Research Online Documents on Economics 87592, London School of Economics and Political Science, LSE Library.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rauf Ahmad, M., 2019. "A significance test of the RV coefficient in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 116-130.
    2. Josse, J. & Pagès, J. & Husson, F., 2008. "Testing the significance of the RV coefficient," Computational Statistics & Data Analysis, Elsevier, vol. 53(1), pages 82-91, September.
    3. Bavaud, François, 2023. "Exact first moments of the RV coefficient by invariant orthogonal integration," Journal of Multivariate Analysis, Elsevier, vol. 198(C).
    4. Xiang Zhan & Anna Plantinga & Ni Zhao & Michael C. Wu, 2017. "A fast small‐sample kernel independence test for microbiome community‐level association analysis," Biometrics, The International Biometric Society, vol. 73(4), pages 1453-1463, December.
    5. Antonio Lucadamo & Pietro Amenta, 2015. "A proposal for handling ordinal categorical variables in co-inertia analysis," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(12), pages 2631-2638, December.
    6. Kazi-Aoual, Frederique & Hitier, Simon & Sabatier, Robert & Lebreton, Jean-Dominique, 1995. "Refined approximations to permutation tests for multivariate inference," Computational Statistics & Data Analysis, Elsevier, vol. 20(6), pages 643-656, December.
    7. Figueiredo, Adelaide & Figueiredo, Fernanda & Monteiro, Natália P. & Straume, Odd Rune, 2012. "Restructuring in privatised firms: A Statis approach," Structural Change and Economic Dynamics, Elsevier, vol. 23(1), pages 108-116.
    8. Hyodo, Masashi & Nishiyama, Takahiro & Pavlenko, Tatjana, 2020. "Testing for independence of high-dimensional variables: ρV-coefficient based approach," Journal of Multivariate Analysis, Elsevier, vol. 178(C).
    9. Tenenhaus, Michel & Vinzi, Vincenzo Esposito & Chatelin, Yves-Marie & Lauro, Carlo, 2005. "PLS path modeling," Computational Statistics & Data Analysis, Elsevier, vol. 48(1), pages 159-205, January.
    10. Wenbin Ruan & Zhenzhou Lu & Pengfei Wei, 2013. "Estimation of conditional moment by moving least squares and its application for importance analysis," Journal of Risk and Reliability, , vol. 227(6), pages 641-650, December.
    11. Modroño Herrán, Juan Ignacio & Fernández Aguirre, María Carmen & Landaluce Calvo, M. Isabel, 2003. "Una propuesta para el análisis de tablas múltiples," BILTOKI 1134-8984, Universidad del País Vasco - Departamento de Economía Aplicada III (Econometría y Estadística).
    12. Andrii Babii & Eric Ghysels & Junsu Pan, 2022. "Tensor Principal Component Analysis," Papers 2212.12981, arXiv.org, revised Aug 2023.
    13. Tenenhaus, Arthur & Philippe, Cathy & Frouin, Vincent, 2015. "Kernel Generalized Canonical Correlation Analysis," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 114-131.
    14. Haruhiko Ogasawara, 2009. "Asymptotic expansions in the singular value decomposition for cross covariance and correlation under nonnormality," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 61(4), pages 995-1017, December.
    15. Michel Tenenhaus & Arthur Tenenhaus & Patrick J. F. Groenen, 2017. "Regularized Generalized Canonical Correlation Analysis: A Framework for Sequential Multiblock Component Methods," Psychometrika, Springer;The Psychometric Society, vol. 82(3), pages 737-777, September.
    16. Perignon, Christophe & Smith, Daniel R. & Villa, Christophe, 2007. "Why common factors in international bond returns are not so common," Journal of International Money and Finance, Elsevier, vol. 26(2), pages 284-304, March.
    17. Babii, Andrii & Chen, Xi & Ghysels, Eric, 2019. "Commercial and Residential Mortgage Defaults: Spatial Dependence with Frailty," Journal of Econometrics, Elsevier, vol. 212(1), pages 47-77.
    18. Arthur Tenenhaus & Michel Tenenhaus, 2011. "Regularized Generalized Canonical Correlation Analysis," Psychometrika, Springer;The Psychometric Society, vol. 76(2), pages 257-284, April.
    19. Ogasawara, Haruhiko, 2007. "Asymptotic expansions of the distributions of estimators in canonical correlation analysis under nonnormality," Journal of Multivariate Analysis, Elsevier, vol. 98(9), pages 1726-1750, October.
    20. Tenenhaus, Arthur & Giron, Alain & Viennet, Emmanuel & Bera, Michel & Saporta, Gilbert & Fertil, Bernard, 2007. "Kernel logistic PLS: A tool for supervised nonlinear dimensionality reduction and binary classification," Computational Statistics & Data Analysis, Elsevier, vol. 51(9), pages 4083-4100, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:10:y:2011:i:1:n:14. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.