IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v18y2019i6p28n2.html
   My bibliography  Save this article

Determining the number of components in PLS regression on incomplete data set

Author

Listed:
  • Nengsih Titin Agustin

    (IRMA, CNRS UMR 7501, Université de Strasbourg, 67084 Strasbourg, Cedex, France)

  • Bertrand Frédéric

    (IRMA, CNRS UMR 7501, Université de Strasbourg, 67084 Strasbourg, Cedex, France)

  • Maumy-Bertrand Myriam

    (IRMA, CNRS UMR 7501, Université de Strasbourg, 67084 Strasbourg, Cedex, France)

  • Meyer Nicolas

    (iCUBE, CNRS UMR 7357, Université de Strasbourg, 67400 Strasbourg, France)

Abstract

Partial least squares regression – or PLS regression – is a multivariate method in which the model parameters are estimated using either the SIMPLS or NIPALS algorithm. PLS regression has been extensively used in applied research because of its effectiveness in analyzing relationships between an outcome and one or several components. Note that the NIPALS algorithm can provide estimates parameters on incomplete data. The selection of the number of components used to build a representative model in PLS regression is a central issue. However, how to deal with missing data when using PLS regression remains a matter of debate. Several approaches have been proposed in the literature, including the Q2 criterion, and the AIC and BIC criteria. Here we study the behavior of the NIPALS algorithm when used to fit a PLS regression for various proportions of missing data and different types of missingness. We compare criteria to select the number of components for a PLS regression on incomplete data set and on imputed data set using three imputation methods: multiple imputation by chained equations, k-nearest neighbour imputation, and singular value decomposition imputation. We tested various criteria with different proportions of missing data (ranging from 5% to 50%) under different missingness assumptions. Q2-leave-one-out component selection methods gave more reliable results than AIC and BIC-based ones.

Suggested Citation

  • Nengsih Titin Agustin & Bertrand Frédéric & Maumy-Bertrand Myriam & Meyer Nicolas, 2019. "Determining the number of components in PLS regression on incomplete data set," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(6), pages 1-28, December.
  • Handle: RePEc:bpj:sagmbi:v:18:y:2019:i:6:p:28:n:2
    DOI: 10.1515/sagmb-2018-0059
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2018-0059
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2018-0059?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    2. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    3. Patrick Royston, 2004. "Multiple imputation of missing values," Stata Journal, StataCorp LP, vol. 4(3), pages 227-241, September.
    4. Nguyen, Danh V. & Rocke, D.M.David M., 2004. "On partial least squares dimension reduction for microarray-based classification: a simulation study," Computational Statistics & Data Analysis, Elsevier, vol. 46(3), pages 407-425, June.
    5. Horton N. J. & Lipsitz S. R., 2001. "Multiple Imputation in Practice: Comparison of Software Packages for Regression Models With Missing Variables," The American Statistician, American Statistical Association, vol. 55, pages 244-254, August.
    6. Royston, Patrick & White, Ian R., 2011. "Multiple Imputation by Chained Equations (MICE): Implementation in Stata," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i04).
    7. Hirotugu Akaike, 1969. "Fitting autoregressive models for prediction," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 21(1), pages 243-247, December.
    8. Serneels, Sven & Verdonck, Tim, 2008. "Principal component analysis for data containing outliers and missing elements," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1712-1727, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ahmad R. Alsaber & Jiazhu Pan & Adeeba Al-Hurban, 2021. "Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)," IJERPH, MDPI, vol. 18(3), pages 1-25, February.
    2. Henry Webel & Lili Niu & Annelaura Bach Nielsen & Marie Locard-Paulet & Matthias Mann & Lars Juhl Jensen & Simon Rasmussen, 2024. "Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    3. Simon Grund & Oliver Lüdtke & Alexander Robitzsch, 2018. "Multiple Imputation of Missing Data at Level 2: A Comparison of Fully Conditional and Joint Modeling in Multilevel Designs," Journal of Educational and Behavioral Statistics, , vol. 43(3), pages 316-353, June.
    4. Manuel S. González Canché, 2017. "Financial Benefits of Rapid Student Loan Repayment: An Analytic Framework Employing Two Decades of Data," The ANNALS of the American Academy of Political and Social Science, , vol. 671(1), pages 154-182, May.
    5. Zhong, Hua & Hu, Wuyang, 2015. "Farmers’ Willingness to Engage in Best Management Practices: an Application of Multiple Imputation," 2015 Annual Meeting, January 31-February 3, 2015, Atlanta, Georgia 196962, Southern Agricultural Economics Association.
    6. Nicholas Tierney & Dianne Cook, 2018. "Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations," Monash Econometrics and Business Statistics Working Papers 14/18, Monash University, Department of Econometrics and Business Statistics.
    7. Stuart R. Lipsitz & Garrett M. Fitzmaurice & Roger D. Weiss, 2020. "Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes," Psychometrika, Springer;The Psychometric Society, vol. 85(4), pages 890-904, December.
    8. Burns, Christopher & Prager, Daniel & Ghosh, Sujit & Goodwin, Barry, 2015. "Imputing for Missing Data in the ARMS Household Section: A Multivariate Imputation Approach," 2015 AAEA & WAEA Joint Annual Meeting, July 26-28, San Francisco, California 205291, Agricultural and Applied Economics Association.
    9. Christian Seiler, 2013. "Nonresponse in Business Tendency Surveys: Theoretical Discourse and Empirical Evidence," ifo Beiträge zur Wirtschaftsforschung, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, number 52.
    10. Mueller, Christoph Emanuel & Keil, Silke Inga & Bauer, Christian, 2017. "Effects of spatial proximity to proposed high-voltage transmission lines: Evidence from a natural experiment in Lower Saxony," Energy Policy, Elsevier, vol. 111(C), pages 137-147.
    11. Kristian Kleinke & Mark Stemmler & Jost Reinecke & Friedrich Lösel, 2011. "Efficient ways to impute incomplete panel data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 95(4), pages 351-373, December.
    12. Geronimi, J. & Saporta, G., 2017. "Variable selection for multiply-imputed data with penalized generalized estimating equations," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 103-114.
    13. Marco Geraci & Alexander McLain, 2018. "Multiple Imputation for Bounded Variables," Psychometrika, Springer;The Psychometric Society, vol. 83(4), pages 919-940, December.
    14. Jing Dai & Stefan Sperlich & Walter Zucchini, 2011. "Estimating and Predicting Household Expenditures and Income Distributions," MAGKS Papers on Economics 201147, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    15. Sara Saadatmand & Khodakaram Salimifard & Reza Mohammadi & Alex Kuiper & Maryam Marzban & Akram Farhadi, 2023. "Using machine learning in prediction of ICU admission, mortality, and length of stay in the early stage of admission of COVID-19 patients," Annals of Operations Research, Springer, vol. 328(1), pages 1043-1071, September.
    16. Landau, E.R. & Raniti, M.B. & Blake, M. & Waloszek, J.M. & Blake, L. & Simmons, J.G. & Schwartz, O. & Murray, G. & Trinder, J. & Allen, N.B. & Byrne, M.L., 2021. "The ratio of morning cortisol to CRP prospectively predicts first-onset depression in at-risk adolescents," Social Science & Medicine, Elsevier, vol. 281(C).
    17. Hamid Heidarian Miri & Jafar Hassanzadeh & Abdolreza Rajaeefard & Majid Mirmohammadkhani & Kambiz Ahmadi Angali, 2016. "Multiple Imputation to Correct for Nonresponse Bias: Application in Non-communicable Disease Risk Factors Survey," Global Journal of Health Science, Canadian Center of Science and Education, vol. 8(1), pages 133-133, January.
    18. Liouaeddine, Mariem & Bijou, Mohammed & Naji, Faïrouz, 2017. "The Main Determinants of Moroccan Students' Outcomes," MPRA Paper 80247, University Library of Munich, Germany.
    19. Mueller, Christoph Emanuel, 2019. "Effects of spatial proximity to proposed electric power lines on residents' expectations, attitudes, and protest behavior: A replication study," Energy Policy, Elsevier, vol. 130(C), pages 341-346.
    20. Adel Bosch & Steven F. Koch, 2021. "Individual and Household Debt: Does Imputation Choice Matter?," Working Papers 202141, University of Pretoria, Department of Economics.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:18:y:2019:i:6:p:28:n:2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.