IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i19p4154-d1252858.html
   My bibliography  Save this article

PFA-Nipals: An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares

Author

Listed:
  • Emilio Castillo-Ibarra

    (Engineering Systems Doctoral Program, Faculty of Engineering, Universidad de Talca, Campus Curicó, Curicó 3340000, Chile)

  • Marco A. Alsina

    (Faculty of Engineering, Architecture and Design, Universidad San Sebastian, Bellavista 7, Santiago 8420524, Chile)

  • Cesar A. Astudillo

    (Department of Computer Science, Faculty of Engineering, University of Talca, Campus Curicó, Curicó 3340000, Chile)

  • Ignacio Fuenzalida-Henríquez

    (Building Management and Engineering Department, Faculty of Engineering, University of Talca, Campus Curicó, Curicó 3340000, Chile)

Abstract

Unsupervised feature selection (UFS) has received great interest in various areas of research that require dimensionality reduction, including machine learning, data mining, and statistical analysis. However, UFS algorithms are known to perform poorly on datasets with missing data, exhibiting a significant computational load and learning bias. In this work, we propose a novel and robust UFS method, designated PFA-Nipals, that works with missing data without the need for deletion or imputation. This is achieved by considering an iterative nonlinear estimation of principal components by partial least squares, while the relevant features are selected through minibatch K-means clustering. The proposed method is successfully applied to select the relevant features of a robust health dataset with missing data, outperforming other UFS methods in terms of computational load and learning bias. Furthermore, the proposed method is capable of finding a consistent set of relevant features without biasing the explained variability, even under increasing missing data. Finally, it is expected that the proposed method could be used in several areas, such as machine learning and big data with applications in different areas of the medical and engineering sciences.

Suggested Citation

  • Emilio Castillo-Ibarra & Marco A. Alsina & Cesar A. Astudillo & Ignacio Fuenzalida-Henríquez, 2023. "PFA-Nipals: An Unsupervised Principal Feature Selection Based on Nonlinear Estimation by Iterative Partial Least Squares," Mathematics, MDPI, vol. 11(19), pages 1-25, October.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:19:p:4154-:d:1252858
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/19/4154/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/19/4154/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. I. T. Jolliffe, 1972. "Discarding Variables in a Principal Component Analysis. I: Artificial Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 21(2), pages 160-173, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hertrich Markus, 2019. "A Novel Housing Price Misalignment Indicator for Germany," German Economic Review, De Gruyter, vol. 20(4), pages 759-794, December.
    2. Colosimo Bianca Maria & Moya Ester Gutierrez & Moroni Giovanni & Petrò Stefano, 2008. "Statistical Sampling Strategies for Geometric Tolerance Inspection by CMM," Stochastics and Quality Control, De Gruyter, vol. 23(1), pages 109-121, January.
    3. Sürücü, Lütfi & YIKILMAZ, İbrahim & MASLAKÇI, Ahmet, 2022. "Exploratory Factor Analysis (EFA) in Quantitative Researches and Practical Considerations," OSF Preprints fgd4e, Center for Open Science.
    4. Hatem Jemmali & Mohamed Salah Matoussi, 2012. "A Multidimensional Analysis of Water Poverty at A Local Scale- Application of Improved Water Poverty Index for Tunisia," Working Papers 730, Economic Research Forum, revised 2012.
    5. Pattravadee Ploykitikoon & Charles M. Weber, 2019. "Knowledge Pathways and Performance: An Empirical Study of the National Laboratories in a Technology Latecomer Country," International Journal of Innovation and Technology Management (IJITM), World Scientific Publishing Co. Pte. Ltd., vol. 16(03), pages 1-37, May.
    6. Gweneth Leigh & Milica Muminovic & Rachel Davey, 2023. "Enjoyed by Jack but Endured by Jill: An Exploratory Case Study Examining Differences in Adolescent Design Preferences and Perceived Impacts of a Secondary Schoolyard," IJERPH, MDPI, vol. 20(5), pages 1-14, February.
    7. Pacheco, Joaquín & Casado, Silvia & Porras, Santiago, 2013. "Exact methods for variable selection in principal component analysis: Guide functions and pre-selection," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 95-111.
    8. Jérome SARACCO & Marie CHAVENT & Vanessa KUENTZ, 2010. "Clustering of categorical variables around latent variables," Cahiers du GREThA (2007-2019) 2010-02, Groupe de Recherche en Economie Théorique et Appliquée (GREThA).
    9. Martínez-Ventura, Constanza & Mariño-Martínez, Ricardo & Miguélez-Márquez, Javier, 2023. "Redundancy of Centrality Measures in Financial Market Infrastructures," Latin American Journal of Central Banking (previously Monetaria), Elsevier, vol. 4(4).
    10. Véronique Cariou & Stéphane Verdun & Emmanuelle Diaz & El Qannari & Evelyne Vigneau, 2009. "Comparison of three hypothesis testing approaches for the selection of the appropriate number of clusters of variables," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 3(3), pages 227-241, December.
    11. Huseyin Aytug & Siong Hook Law & Nirvikar Singh, 2018. "What Can We Learn from Global and Regional Rankings of Countries?," Millennial Asia, , vol. 9(2), pages 119-139, August.
    12. Psaradakis, Zacharias & Vávra, Marián, 2014. "On testing for nonlinearity in multivariate time series," Economics Letters, Elsevier, vol. 125(1), pages 1-4.
    13. Bauer, Jan O. & Drabant, Bernhard, 2021. "Principal loading analysis," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    14. Serena Ng, 2017. "Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data," NBER Working Papers 23673, National Bureau of Economic Research, Inc.
    15. Oliveira, Paulo Ricardo Silva & Silveira, José Maria Ferreira Jardim da & Magalhães, Marcelo Marques de & Souza, Roney Fraga, 2020. "International trade in GMOs: have markets paid premiums on Brazilian soybeans?," Revista de Economia e Sociologia Rural (RESR), Sociedade Brasileira de Economia e Sociologia Rural, vol. 58(1), January.
    16. Cumming, J.A. & Wooff, D.A., 2007. "Dimension reduction via principal variables," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 550-565, September.
    17. Zeynep Reva & Oğuz Polat, 2023. "Road Rage as a Type of Violation of Well-Being in Traffic: The Case of Turkey," Sustainability, MDPI, vol. 15(6), pages 1-19, March.
    18. Kupabado, Moses Mananyi & Mensah-Bonsu, Akwasi, 2024. "Mapping of community perspectives on land acquisition for biofuel investment in northern Ghana," Land Use Policy, Elsevier, vol. 141(C).
    19. Diego Bernardo Avanzini, 2009. "Designing Composite Entrepreneurship Indicators: An Application Using Consensus PCA," WIDER Working Paper Series RP2009-41, World Institute for Development Economic Research (UNU-WIDER).
    20. Gary T. Henry & James H. McMillan, 1993. "Performance Data," Evaluation Review, , vol. 17(6), pages 643-652, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:19:p:4154-:d:1252858. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.