IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i13p2863-d1179731.html
   My bibliography  Save this article

Variable Selection for Meaningful Clustering of Multitopic Territorial Data

Author

Listed:
  • Xavier Angerri

    (Intelligent Data Science and Artificial Intelligence Research Center and Institut de Ciència i Tecnologia de la Sostenibilitat, Universitat Politècnica de Catalunya-BarcelonaTech, 08034 Barcelona, Spain)

  • Karina Gibert

    (Intelligent Data Science and Artificial Intelligence Research Center and Institut de Ciència i Tecnologia de la Sostenibilitat, Universitat Politècnica de Catalunya-BarcelonaTech, 08034 Barcelona, Spain)

Abstract

This paper proposes a new methodology to improve territorial cohesion in clustering processes where many variables from different topics are considered. Clustering techniques provide added value to identify typologies, but there are still unsolved challenges when data contain an unbalanced number of variables from different topics. The territorial feature selection method (TFSM) is presented as a method to select the representative variable of each topic such that the interpretability of resulting clusters is preserved and the geographical cohesion is improved with respect to classical approaches. This paper also introduces the thermometer as a new knowledge acquisition tool that allows experts to transfer semantics to the data mining process. TFSM proposes the index of potential explainability ( E k ) as the criteria to select the most promising variables for clustering. E k is based on the combination of inferential testing and metrics such as support. The proposal is applied with the INSESS-COVID19 database, where territorial groups of vulnerable populations were found. A set of 195 variables with 21 unbalanced thematic blocks is used to compare the results with a traditional multiview clustering analysis with promising results from both the geographical and the thematic point of view and the capacity to support further decision making.

Suggested Citation

  • Xavier Angerri & Karina Gibert, 2023. "Variable Selection for Meaningful Clustering of Multitopic Territorial Data," Mathematics, MDPI, vol. 11(13), pages 1-33, June.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:13:p:2863-:d:1179731
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/13/2863/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/13/2863/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Jun Yan & Jian Huang, 2012. "Model Selection for Cox Models with Time-Varying Coefficients," Biometrics, The International Biometric Society, vol. 68(2), pages 419-428, June.
    3. Ye, Ya-Fen & Shao, Yuan-Hai & Deng, Nai-Yang & Li, Chun-Na & Hua, Xiang-Yu, 2017. "Robust Lp-norm least squares support vector regression with feature selection," Applied Mathematics and Computation, Elsevier, vol. 305(C), pages 32-52.
    4. Guillaume Sagnol & Edouard Pauwels, 2019. "An unexpected connection between Bayes A-optimal designs and the group lasso," Statistical Papers, Springer, vol. 60(2), pages 565-584, April.
    5. Bakalli, Gaetan & Guerrier, Stéphane & Scaillet, Olivier, 2023. "A penalized two-pass regression to predict stock returns with time-varying risk premia," Journal of Econometrics, Elsevier, vol. 237(2).
    6. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    7. Peng, Heng & Lu, Ying, 2012. "Model selection in linear mixed effect models," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 109-129.
    8. Yize Zhao & Matthias Chung & Brent A. Johnson & Carlos S. Moreno & Qi Long, 2016. "Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1427-1439, October.
    9. G. Aneiros & P. Vieu, 2016. "Sparse nonparametric model for regression with functional covariate," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 28(4), pages 839-859, October.
    10. Wongsa-art, Pipat & Kim, Namhyun & Xia, Yingcun & Moscone, Francesco, 2024. "Varying coefficient panel data models and methods under correlated error components: Application to disparities in mental health services in England," Regional Science and Urban Economics, Elsevier, vol. 106(C).
    11. Dong, C. & Li, S., 2021. "Specification Lasso and an Application in Financial Markets," Cambridge Working Papers in Economics 2139, Faculty of Economics, University of Cambridge.
    12. Lam, Clifford, 2008. "Estimation of large precision matrices through block penalization," LSE Research Online Documents on Economics 31543, London School of Economics and Political Science, LSE Library.
    13. Gregory Vaughan & Robert Aseltine & Kun Chen & Jun Yan, 2017. "Stagewise generalized estimating equations with grouped variables," Biometrics, The International Biometric Society, vol. 73(4), pages 1332-1342, December.
    14. Pradeep Ravikumar & John Lafferty & Han Liu & Larry Wasserman, 2009. "Sparse additive models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 1009-1030, November.
    15. Yucheng Yang & Zhong Zheng & Weinan E, 2020. "Interpretable Neural Networks for Panel Data Analysis in Economics," Papers 2010.05311, arXiv.org, revised Nov 2020.
    16. Devijver, Emilie, 2017. "Joint rank and variable selection for parsimonious estimation in a high-dimensional finite mixture regression model," Journal of Multivariate Analysis, Elsevier, vol. 157(C), pages 1-13.
    17. Zhang, Tao & Zhang, Qingzhao & Wang, Qihua, 2014. "Model detection for functional polynomial regression," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 183-197.
    18. Madeleine Cule & Richard Samworth & Michael Stewart, 2010. "Maximum likelihood estimation of a multi‐dimensional log‐concave density," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(5), pages 545-607, November.
    19. Caner, Mehmet, 2023. "Generalized linear models with structured sparsity estimators," Journal of Econometrics, Elsevier, vol. 236(2).
    20. Toshio Honda, 2021. "The de-biased group Lasso estimation for varying coefficient models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(1), pages 3-29, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:13:p:2863-:d:1179731. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.