IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v10y2011i1n30.html
   My bibliography  Save this article

Sparse Canonical Covariance Analysis for High-throughput Data

Author

Listed:
  • Lee Woojoo
  • Lee Donghwan
  • Lee Youngjo
  • Pawitan Yudi

Abstract

Canonical covariance analysis (CCA) has gained popularity as a method for the analysis of two sets of high-dimensional genomic data. However, it is often difficult to interpret the results because canonical vectors are linear combinations of all variables, and the coefficients are typically nonzero. Several sparse CCA methods have recently been proposed for reducing the number of nonzero coefficients, but these existing methods are not satisfactory because they still give too many nonzero coefficients. In this paper, we propose a new random-effect model approach for sparse CCA; the proposed algorithm can adapt arbitrary penalty functions to CCA without much computational demands. Through simulation studies, we compare various penalty functions in terms of the performance of correct model identification. We also develop an extension of sparse CCA to address more than two sets of variables on the same set of observations. We illustrate the method with an analysis of the NCI cancer dataset.

Suggested Citation

  • Lee Woojoo & Lee Donghwan & Lee Youngjo & Pawitan Yudi, 2011. "Sparse Canonical Covariance Analysis for High-throughput Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-24, July.
  • Handle: RePEc:bpj:sagmbi:v:10:y:2011:i:1:n:30
    DOI: 10.2202/1544-6115.1638
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1638
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1638?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Johnstone, Iain M. & Lu, Arthur Yu, 2009. "On Consistency and Sparsity for Principal Components Analysis in High Dimensions," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 682-693.
    2. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    3. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    4. A. Salim & Y. Pawitan & K. Bond, 2005. "Modelling association between two irregularly observed spatiotemporal processes by using maximum covariance analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(3), pages 555-573, June.
    5. Hyonho Chun & Sündüz Keleş, 2010. "Sparse partial least squares regression for simultaneous dimension reduction and variable selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(1), pages 3-25, January.
    6. Witten Daniela M & Tibshirani Robert J., 2009. "Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-29, June.
    7. Rothman, Adam J. & Levina, Elizaveta & Zhu, Ji, 2009. "Generalized Thresholding of Large Covariance Matrices," Journal of the American Statistical Association, American Statistical Association, vol. 104(485), pages 177-186.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wang, Wenjia & Zhou, Yi-Hui, 2021. "Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors," Journal of Multivariate Analysis, Elsevier, vol. 185(C).
    2. Kwon, Sunghoon & Oh, Seungyoung & Lee, Youngjo, 2016. "The use of random-effect models for high-dimensional variable selection problems," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 401-412.
    3. Lee, Youngjo & Oh, Hee-Seok, 2014. "A new sparse variable selection via random-effect model," Journal of Multivariate Analysis, Elsevier, vol. 125(C), pages 89-99.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fan, Jianqing & Jiang, Bai & Sun, Qiang, 2022. "Bayesian factor-adjusted sparse regression," Journal of Econometrics, Elsevier, vol. 230(1), pages 3-19.
    2. Qiang Sun & Hongtu Zhu & Yufeng Liu & Joseph G. Ibrahim, 2015. "SPReM: Sparse Projection Regression Model For High-Dimensional Linear Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 289-302, March.
    3. Xi Luo, 2011. "Recovering Model Structures from Large Low Rank and Sparse Covariance Matrix Estimation," Papers 1111.1133, arXiv.org, revised Mar 2013.
    4. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.
    5. Luo, Ruiyan & Qi, Xin, 2015. "Sparse wavelet regression with multiple predictive curves," Journal of Multivariate Analysis, Elsevier, vol. 134(C), pages 33-49.
    6. Bai, Jushan & Liao, Yuan, 2012. "Efficient Estimation of Approximate Factor Models," MPRA Paper 41558, University Library of Munich, Germany.
    7. Goh, Gyuhyeong & Dey, Dipak K. & Chen, Kun, 2017. "Bayesian sparse reduced rank multivariate regression," Journal of Multivariate Analysis, Elsevier, vol. 157(C), pages 14-28.
    8. Bai, Jushan & Ando, Tomohiro, 2013. "Multifactor asset pricing with a large number of observable risk factors and unobservable common and group-specific factors," MPRA Paper 52785, University Library of Munich, Germany, revised Dec 2013.
    9. Oguzhan Cepni & I. Ethem Guney & Norman R. Swanson, 2020. "Forecasting and nowcasting emerging market GDP growth rates: The role of latent global economic policy uncertainty and macroeconomic data surprise factors," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(1), pages 18-36, January.
    10. Benjamin Poignard & Manabu Asai, 2023. "Estimation of high-dimensional vector autoregression via sparse precision matrix," The Econometrics Journal, Royal Economic Society, vol. 26(2), pages 307-326.
    11. Peng, Liuhua & Chen, Song Xi & Zhou, Wen, 2016. "More powerful tests for sparse high-dimensional covariances matrices," Journal of Multivariate Analysis, Elsevier, vol. 149(C), pages 124-143.
    12. Jin-Chuan Duan & Weimin Miao, 2016. "Default Correlations and Large-Portfolio Credit Analysis," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 536-546, October.
    13. Bernardi, Mauro & Costola, Michele, 2019. "High-dimensional sparse financial networks through a regularised regression model," SAFE Working Paper Series 244, Leibniz Institute for Financial Research SAFE.
    14. Debamita Kundu & Riten Mitra & Jeremy T. Gaskins, 2021. "Bayesian variable selection for multioutcome models through shared shrinkage," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(1), pages 295-320, March.
    15. Jean-Pierre Dubé & Sanjog Misra, 2017. "Personalized Pricing and Consumer Welfare," NBER Working Papers 23775, National Bureau of Economic Research, Inc.
    16. Lee Anthony & Caron Francois & Doucet Arnaud & Holmes Chris, 2012. "Bayesian Sparsity-Path-Analysis of Genetic Association Signal using Generalized t Priors," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(2), pages 1-31, January.
    17. Shutes, Karl & Adcock, Chris, 2013. "Regularized Extended Skew-Normal Regression," MPRA Paper 58445, University Library of Munich, Germany, revised 09 Sep 2014.
    18. Li, Degui, 2024. "Estimation of Large Dynamic Covariance Matrices: A Selective Review," Econometrics and Statistics, Elsevier, vol. 29(C), pages 16-30.
    19. De Luca, Giuseppe & Magnus, Jan R. & Peracchi, Franco, 2018. "Weighted-average least squares estimation of generalized linear models," Journal of Econometrics, Elsevier, vol. 204(1), pages 1-17.
    20. Yu-Zhu Tian & Man-Lai Tang & Wai-Sum Chan & Mao-Zai Tian, 2021. "Bayesian bridge-randomized penalized quantile regression for ordinal longitudinal data, with application to firm’s bond ratings," Computational Statistics, Springer, vol. 36(2), pages 1289-1319, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:10:y:2011:i:1:n:30. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.