IDEAS home Printed from https://ideas.repec.org/p/aiz/louvad/2021040.html
   My bibliography  Save this paper

VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering

Author

Listed:
  • Marion, Rebecca

    (Université catholique de Louvain, LIDAM/ISBA, Belgium)

  • Lederer, Johannes
  • Govaerts, Bernadette

    (Université catholique de Louvain, LIDAM/ISBA, Belgium)

  • von Sachs, Rainer

    (Université catholique de Louvain, LIDAM/ISBA, Belgium)

Abstract

Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisfactory performance for prediction, variable selection and variable clustering simultaneously. This paper presents Variable Cluster Principal Component Regression (VC-PCR), a prediction method that supervises variable selection and variable clustering in order to solve this problem. Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present.

Suggested Citation

  • Marion, Rebecca & Lederer, Johannes & Govaerts, Bernadette & von Sachs, Rainer, 2021. "VC-PCR: A Prediction Method based on Supervised Variable Selection and Clustering," LIDAM Discussion Papers ISBA 2021040, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
  • Handle: RePEc:aiz:louvad:2021040
    as

    Download full text from publisher

    File URL: https://dial.uclouvain.be/pr/boreal/en/object/boreal%3A254939/datastream/PDF_01/view
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Howard D. Bondell & Brian J. Reich, 2008. "Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR," Biometrics, The International Biometric Society, vol. 64(1), pages 115-123, March.
    2. Marion, Rebecca & Govaerts, Bernadette & von Sachs, Rainer, 2020. "AdaCLV for Interpretable Variable Clustering and Dimensionality Reduction of Spectroscopic Data," LIDAM Discussion Papers ISBA 2020011, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    3. Xiaotong Shen & Hsin-Cheng Huang & Wei Pan, 2012. "Simultaneous supervised clustering and feature selection over a graph," Biometrika, Biometrika Trust, vol. 99(4), pages 899-914.
    4. Marion, Rebecca & Govaerts, Bernadette & von Sachs, Rainer, 2020. "AdaCLV for Interpretable Variable Clustering and Dimensionality Reduction of Spectroscopic Data," LIDAM Reprints ISBA 2020033, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    5. Daye, Z. John & Jeng, X. Jessie, 2009. "Shrinkage and model selection with correlated variables via weighted fusion," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1284-1298, February.
    6. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jeon, Jong-June & Kwon, Sunghoon & Choi, Hosik, 2017. "Homogeneity detection for the high-dimensional generalized linear model," Computational Statistics & Data Analysis, Elsevier, vol. 114(C), pages 61-74.
    2. Marra, Giampiero & Wood, Simon N., 2011. "Practical variable selection for generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2372-2387, July.
    3. Minami, Kentaro, 2020. "Degrees of freedom in submodular regularization: A computational perspective of Stein’s unbiased risk estimate," Journal of Multivariate Analysis, Elsevier, vol. 175(C).
    4. Liu, Jianyu & Yu, Guan & Liu, Yufeng, 2019. "Graph-based sparse linear discriminant analysis for high-dimensional classification," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 250-269.
    5. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    6. Justin B. Post & Howard D. Bondell, 2013. "Factor Selection and Structural Identification in the Interaction ANOVA Model," Biometrics, The International Biometric Society, vol. 69(1), pages 70-79, March.
    7. Peter Radchenko & Gourab Mukherjee, 2017. "Convex clustering via l 1 fusion penalization," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(5), pages 1527-1546, November.
    8. Jiang, Liewen & Bondell, Howard D. & Wang, Huixia Judy, 2014. "Interquantile shrinkage and variable selection in quantile regression," Computational Statistics & Data Analysis, Elsevier, vol. 69(C), pages 208-219.
    9. Shanshan Qin & Hao Ding & Yuehua Wu & Feng Liu, 2021. "High-dimensional sign-constrained feature selection and grouping," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(4), pages 787-819, August.
    10. Philip Kostov & Thankom Arun & Samuel Annim, 2014. "Financial Services to the Unbanked: the case of the Mzansi intervention in South Africa," Contemporary Economics, University of Economics and Human Sciences in Warsaw., vol. 8(2), June.
    11. Wei Pan & Benhuai Xie & Xiaotong Shen, 2010. "Incorporating Predictor Network in Penalized Regression with Application to Microarray Data," Biometrics, The International Biometric Society, vol. 66(2), pages 474-484, June.
    12. Yang, Yuehan & Xia, Siwei & Yang, Hu, 2023. "Multivariate sparse Laplacian shrinkage for joint estimation of two graphical structures," Computational Statistics & Data Analysis, Elsevier, vol. 178(C).
    13. Siwei Xia & Yuehan Yang & Hu Yang, 2022. "Sparse Laplacian Shrinkage with the Graphical Lasso Estimator for Regression Problems," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 255-277, March.
    14. Chakraborty, Sounak & Lozano, Aurelie C., 2019. "A graph Laplacian prior for Bayesian variable selection and grouping," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 72-91.
    15. Yue, Lili & Li, Gaorong & Lian, Heng & Wan, Xiang, 2019. "Regression adjustment for treatment effect with multicollinearity in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 134(C), pages 17-35.
    16. Cui, Qiurong & Xu, Yuqing & Zhang, Zhengjun & Chan, Vincent, 2021. "Max-linear regression models with regularization," Journal of Econometrics, Elsevier, vol. 222(1), pages 579-600.
    17. Banerjee, Trambak & Mukherjee, Gourab & Radchenko, Peter, 2017. "Feature screening in large scale cluster analysis," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 191-212.
    18. McKay Curtis, S. & Banerjee, Sayantan & Ghosal, Subhashis, 2014. "Fast Bayesian model assessment for nonparametric additive regression," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 347-358.
    19. Sunkyung Kim & Wei Pan & Xiaotong Shen, 2013. "Network-Based Penalized Regression With Application to Genomic Data," Biometrics, The International Biometric Society, vol. 69(3), pages 582-593, September.
    20. Howard D. Bondell & Brian J. Reich, 2009. "Simultaneous Factor Selection and Collapsing Levels in ANOVA," Biometrics, The International Biometric Society, vol. 65(1), pages 169-177, March.

    More about this item

    Keywords

    Variable clustering ; dimensionality reduction ; nonnegative matrix factorization ; latent variables ; sparsity ; prediction;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:aiz:louvad:2021040. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Nadja Peiffer (email available below). General contact details of provider: https://edirc.repec.org/data/isuclbe.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.