IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i8p1283-d792189.html
   My bibliography  Save this article

Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review

Author

Listed:
  • Jireh Yi-Le Chan

    (Faculty of Business and Finance, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia
    These authors contributed equally to this work.)

  • Steven Mun Hong Leow

    (Faculty of Business and Finance, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia
    These authors contributed equally to this work.)

  • Khean Thye Bea

    (Faculty of Business and Finance, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia)

  • Wai Khuen Cheng

    (Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia)

  • Seuk Wai Phoong

    (Department of Management, Faculty of Business and Economics, Universiti Malaya, Kuala Lumpur 50603, Malaysia)

  • Zeng-Wei Hong

    (Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407102, Taiwan)

  • Yen-Lin Chen

    (Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 106344, Taiwan)

Abstract

Technologies have driven big data collection across many fields, such as genomics and business intelligence. This results in a significant increase in variables and data points (observations) collected and stored. Although this presents opportunities to better model the relationship between predictors and the response variables, this also causes serious problems during data analysis, one of which is the multicollinearity problem. The two main approaches used to mitigate multicollinearity are variable selection methods and modified estimator methods. However, variable selection methods may negate efforts to collect more data as new data may eventually be dropped from modeling, while recent studies suggest that optimization approaches via machine learning handle data with multicollinearity better than statistical estimators. Therefore, this study details the chronological developments to mitigate the effects of multicollinearity and up-to-date recommendations to better mitigate multicollinearity.

Suggested Citation

  • Jireh Yi-Le Chan & Steven Mun Hong Leow & Khean Thye Bea & Wai Khuen Cheng & Seuk Wai Phoong & Zeng-Wei Hong & Yen-Lin Chen, 2022. "Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review," Mathematics, MDPI, vol. 10(8), pages 1-17, April.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:8:p:1283-:d:792189
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/8/1283/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/8/1283/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ryuta Tamura & Ken Kobayashi & Yuichi Takano & Ryuhei Miyashiro & Kazuhide Nakata & Tomomi Matsui, 2019. "Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor," Journal of Global Optimization, Springer, vol. 73(2), pages 431-446, February.
    2. H. C. Hamaker, 1962. "On multiple regression analysis," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 16(1), pages 31-56, March.
    3. Taewook Kim & Ha Young Kim, 2019. "Forecasting stock prices with a feature fusion LSTM-CNN model using different representations of the same data," PLOS ONE, Public Library of Science, vol. 14(2), pages 1-23, February.
    4. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    5. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    6. Raehyun Kim & Chan Ho So & Minbyul Jeong & Sanghoon Lee & Jinkyu Kim & Jaewoo Kang, 2019. "HATS: A Hierarchical Graph Attention Network for Stock Movement Prediction," Papers 1908.07999, arXiv.org, revised Nov 2019.
    7. C.K. Chandrasekhar & H. Bagyalakshmi & M.R. Srinivasan & M. Gallo, 2016. "Partial ridge regression under multicollinearity," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(13), pages 2462-2473, October.
    8. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    9. Van Cuong Nguyen & Chi Tim Ng, 2020. "Variable selection under multicollinearity using modified log penalty," Journal of Applied Statistics, Taylor & Francis Journals, vol. 47(2), pages 201-230, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhang, Jianhong & van Witteloostuijn, Arjen & Zhou, Chaohong & Zhou, Shengyang, 2024. "Cross-border acquisition completion by emerging market MNEs revisited: Inductive evidence from a machine learning analysis," Journal of World Business, Elsevier, vol. 59(2).
    2. Mònica González-Carrasco & Silvana Aciar & Ferran Casas & Xavier Oriol & Ramon Fabregat & Sara Malo, 2024. "A Machine Learning Approach to Well-Being in Late Childhood and Early Adolescence: The Children’s Worlds Data Case," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 175(1), pages 25-47, October.
    3. Hoxha, Julian & Çodur, Muhammed Yasin & Mustafaraj, Enea & Kanj, Hassan & El Masri, Ali, 2023. "Prediction of transportation energy demand in Türkiye using stacking ensemble models: Methodology and comparative analysis," Applied Energy, Elsevier, vol. 350(C).
    4. You, Geonhwa, 2024. "A comprehensive approach for calibrating anthropogenic effects on atmosphere degradation," Renewable and Sustainable Energy Reviews, Elsevier, vol. 191(C).
    5. Tran Ngoc Mai, 2023. "Renewable Energy, GDP (Gross Domestic Product), FDI (Foreign Direct Investment) and CO2 Emissions in Southeast Asia Countries," International Journal of Energy Economics and Policy, Econjournals, vol. 13(2), pages 284-289, March.
    6. Liu, Yang & Min, Shisheng & Shi, Zhuangbin & He, Mingwei, 2024. "Exploring students' choice of active travel to school in different spatial environments: A case study in a mountain city," Journal of Transport Geography, Elsevier, vol. 115(C).
    7. Nagwan Abdel Samee & Ghada Atteia & Souham Meshoul & Mugahed A. Al-antari & Yasser M. Kadah, 2022. "Deep Learning Cascaded Feature Selection Framework for Breast Cancer Classification: Hybrid CNN with Univariate-Based Approach," Mathematics, MDPI, vol. 10(19), pages 1-27, October.
    8. Cheng, Louis T.W. & Cheong, Tsun Se & Wojewodzki, Michal & Chui, David, 2025. "The effect of ESG divergence on the financial performance of Hong Kong-listed firms: An artificial neural network approach," Research in International Business and Finance, Elsevier, vol. 73(PA).
    9. Wai Khuen Cheng & Khean Thye Bea & Steven Mun Hong Leow & Jireh Yi-Le Chan & Zeng-Wei Hong & Yen-Lin Chen, 2022. "A Review of Sentiment, Semantic and Event-Extraction-Based Approaches in Stock Forecasting," Mathematics, MDPI, vol. 10(14), pages 1-20, July.
    10. de Bruin, Sophie & Hoch, Jannis & de Bruijn, Jens & Hermans, Kathleen & Maharjan, Amina & Kummu, Matti & van Vliet, Jasper, 2024. "Scenario projections of South Asian migration patterns amidst environmental and socioeconomic change," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 88, pages 1-12.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    2. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    3. Caroline Jardet & Baptiste Meunier, 2022. "Nowcasting world GDP growth with high‐frequency data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1181-1200, September.
    4. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    5. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    6. Lee, Ji Hyung & Shi, Zhentao & Gao, Zhan, 2022. "On LASSO for predictive regression," Journal of Econometrics, Elsevier, vol. 229(2), pages 322-349.
    7. Ian W. McKeague & Min Qian, 2015. "An Adaptive Resampling Test for Detecting the Presence of Significant Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1422-1433, December.
    8. Victor Chernozhukov & Christian Hansen & Yuan Liao, 2015. "A lava attack on the recovery of sums of dense and sparse signals," CeMMAP working papers CWP56/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    9. Chen, Shi & Härdle, Wolfgang Karl & López Cabrera, Brenda, 2018. "Regularization Approach for Network Modeling of German Energy Market," IRTG 1792 Discussion Papers 2018-017, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    10. Zakariya Yahya Algamal & Muhammad Hisyam Lee, 2019. "A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 753-771, September.
    11. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    12. Minerva Mukhopadhyay & David B. Dunson, 2020. "Targeted Random Projection for Prediction From High-Dimensional Features," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 1998-2010, December.
    13. Clément Cariou & Amélie Charles & Olivier Darné, 2024. "Are national or regional surveys useful for nowcasting regional jobseekers? The case of the French region of Pays‐de‐la‐Loire," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(6), pages 2341-2357, September.
    14. Dai, Linlin & Chen, Kani & Sun, Zhihua & Liu, Zhenqiu & Li, Gang, 2018. "Broken adaptive ridge regression and its asymptotic properties," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 334-351.
    15. Ruggieri, Eric & Lawrence, Charles E., 2012. "On efficient calculations for Bayesian variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1319-1332.
    16. Chenguang Zhang & Masayuki Nigo & Shivani Patel & Duo Yu & Edward Septimus & Hulin Wu, 2024. "Use of Real-World EMR Data to Rapidly Evaluate Treatment Effects of Existing Drugs for Emerging Infectious Diseases: Remdesivir for COVID-19 Treatment as an Example," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 16(3), pages 604-633, December.
    17. Liming Wang & Xingxiang Li & Xiaoqing Wang & Peng Lai, 2022. "Unified mean-variance feature screening for ultrahigh-dimensional regression," Computational Statistics, Springer, vol. 37(4), pages 1887-1918, September.
    18. Qinqin Hu & Lu Lin, 2022. "Feature Screening in High Dimensional Regression with Endogenous Covariates," Computational Economics, Springer;Society for Computational Economics, vol. 60(3), pages 949-969, October.
    19. Paweł Teisseyre & Robert A. Kłopotek & Jan Mielniczuk, 2016. "Random Subspace Method for high-dimensional regression with the R package regRSM," Computational Statistics, Springer, vol. 31(3), pages 943-972, September.
    20. Haixiang Zhang & Jun Chen & Zhigang Li & Lei Liu, 2021. "Testing for Mediation Effect with Application to Human Microbiome Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(2), pages 313-328, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:8:p:1283-:d:792189. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.