IDEAS home Printed from https://ideas.repec.org/a/eee/chsofr/v151y2021ics0960077921005907.html
   My bibliography  Save this article

Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation

Author

Listed:
  • Alamoodi, A.H.
  • Zaidan, B.B.
  • Zaidan, A.A.
  • Albahri, O.S.
  • Chen, Juliana
  • Chyad, M.A.
  • Garfan, Salem
  • Aleesa, A.M.

Abstract

Missing data is a common problem in real-world data sets and it is amongst the most complex topics in computer science and many other research domains. The common ways to cope with missing values are either by elimination or imputation depending of the volume of the missing data and its distribution nature. It becomes imperative to come up with new imputation approaches along with efficient algorithms. Though most existing imputation methods focus on a moderate amount of missing data, imputation for high missing rates over 80% is still important but challenging. Even with the existence of some works in addressing high missing volume issue, they mostly rely on imputing reference dataset (Complete Datasets for evaluation) after they create artificial missing values and impute it to measure the accuracy of their proposed techniques. So far, the option of imputing high proportions of missing values with no reference comparison dataset (Original Dataset with highly missing values) have been often ignored or overlooked. Therefore, we propose a missing data imputation approach for high volumes of missing values with no reference comparison dataset. The approach makes use of pre-processing measures and breaking the dataset into small continuous non-missing portions then using Multi Criteria Decision-making analysis to select a portion of data which is representative of the entire broken datasets. This portion helps to create reference comparisons and expands the missing dataset through artificial missing-making procedures with different percentages and imputation using different machine learning techniques. This study conducted two experiments using BMI datasets with more than 80% of missing values, derived from the National Child Development Centre (NCDRC) at Sultan Idris Education University (UPSI), Malaysia. The results show that our approach capability in reconstructing datasets with huge missing values.

Suggested Citation

  • Alamoodi, A.H. & Zaidan, B.B. & Zaidan, A.A. & Albahri, O.S. & Chen, Juliana & Chyad, M.A. & Garfan, Salem & Aleesa, A.M., 2021. "Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation," Chaos, Solitons & Fractals, Elsevier, vol. 151(C).
  • Handle: RePEc:eee:chsofr:v:151:y:2021:i:c:s0960077921005907
    DOI: 10.1016/j.chaos.2021.111236
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0960077921005907
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.chaos.2021.111236?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Nagy, Krisztina, 2020. "Term structure estimation with missing data: Application for emerging markets," The Quarterly Review of Economics and Finance, Elsevier, vol. 75(C), pages 347-360.
    2. Lee, Shawna J. & Altschul, Inna & Gershoff, Elizabeth T., 2015. "Wait until your father gets home? Mother's and fathers’ spanking and development of child aggression," Children and Youth Services Review, Elsevier, vol. 52(C), pages 158-166.
    3. Lê, Félice & Diez Roux, Ana & Morgenstern, Hal, 2013. "Effects of child and adolescent health on educational progress," Social Science & Medicine, Elsevier, vol. 76(C), pages 57-66.
    4. Opricovic, Serafim & Tzeng, Gwo-Hshiung, 2004. "Compromise solution by MCDM methods: A comparative analysis of VIKOR and TOPSIS," European Journal of Operational Research, Elsevier, vol. 156(2), pages 445-455, July.
    5. Yu Jiang & Sai Chen & Daniel McGuire & Fang Chen & Mengzhen Liu & William G Iacono & John K Hewitt & John E Hokanson & Kenneth Krauter & Markku Laakso & Kevin W Li & Sharon M Lutz & Matthew McGue & An, 2018. "Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes," PLOS Genetics, Public Library of Science, vol. 14(7), pages 1-19, July.
    6. Kremer, Kristen P. & Flower, Andrea & Huang, Jin & Vaughn, Michael G., 2016. "Behavior problems and children's academic achievement: A test of growth-curve models with gender and racial differences," Children and Youth Services Review, Elsevier, vol. 67(C), pages 95-104.
    7. Sunny, Bindu S. & Elze, Markus & Chihana, Menard & Gondwe, Levie & Crampin, Amelia C. & Munkhondya, Masoyaona & Kondowe, Scotch & Glynn, Judith R., 2017. "Failing to progress or progressing to fail? Age-for-grade heterogeneity and grade repetition in primary schools in Karonga district, northern Malawi," International Journal of Educational Development, Elsevier, vol. 52(C), pages 68-80.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gürler, Hasan Emin & Özçalıcı, Mehmet & Pamucar, Dragan, 2024. "Determining criteria weights with genetic algorithms for multi-criteria decision making methods: The case of logistics performance index rankings of European Union countries," Socio-Economic Planning Sciences, Elsevier, vol. 91(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yongming Song & Jun Hu, 2017. "Vector similarity measures of hesitant fuzzy linguistic term sets and their applications," PLOS ONE, Public Library of Science, vol. 12(12), pages 1-13, December.
    2. Yi Peng, 2015. "Regional earthquake vulnerability assessment using a combination of MCDM methods," Annals of Operations Research, Springer, vol. 234(1), pages 95-110, November.
    3. Zheng, Guozhong & Wang, Xiao, 2020. "The comprehensive evaluation of renewable energy system schemes in tourist resorts based on VIKOR method," Energy, Elsevier, vol. 193(C).
    4. Lin, Sheng-Hau & Zhao, Xiaofeng & Wu, Jiuxing & Liang, Fachao & Li, Jia-Hsuan & Lai, Ren-Ji & Hsieh, Jing-Chzi & Tzeng, Gwo-Hshiung, 2021. "An evaluation framework for developing green infrastructure by using a new hybrid multiple attribute decision-making model for promoting environmental sustainability," Socio-Economic Planning Sciences, Elsevier, vol. 75(C).
    5. Milad Zamanifar & Seyed Mohammad Seyedhoseyni, 2017. "Recovery planning model for roadways network after natural hazards," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 87(2), pages 699-716, June.
    6. Pedro Ponce & Citlaly Pérez & Aminah Robinson Fayek & Arturo Molina, 2022. "Solar Energy Implementation in Manufacturing Industry Using Multi-Criteria Decision-Making Fuzzy TOPSIS and S4 Framework," Energies, MDPI, vol. 15(23), pages 1-19, November.
    7. Mohit Jain & Gunjan Soni & Deepak Verma & Rajendra Baraiya & Bharti Ramtiyal, 2023. "Selection of Technology Acceptance Model for Adoption of Industry 4.0 Technologies in Agri-Fresh Supply Chain," Sustainability, MDPI, vol. 15(6), pages 1-20, March.
    8. Chen, Lisa Y. & Wang, Tien-Chin, 2009. "Optimizing partners' choice in IS/IT outsourcing projects: The strategic decision of fuzzy VIKOR," International Journal of Production Economics, Elsevier, vol. 120(1), pages 233-242, July.
    9. Wenyao Niu & Yuan Rong & Liying Yu & Lu Huang, 2022. "A Novel Hybrid Group Decision Making Approach Based on EDAS and Regret Theory under a Fermatean Cubic Fuzzy Environment," Mathematics, MDPI, vol. 10(17), pages 1-30, August.
    10. Deb, Madhujit & Debbarma, Bishop & Majumder, Arindam & Banerjee, Rahul, 2016. "Performance –emission optimization of a diesel-hydrogen dual fuel operation: A NSGA II coupled TOPSIS MADM approach," Energy, Elsevier, vol. 117(P1), pages 281-290.
    11. Kuang-Hua Hu & Wei Jianguo & Gwo-Hshiung Tzeng, 2017. "Risk Factor Assessment Improvement for China’s Cloud Computing Auditing Using a New Hybrid MADM Model," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(03), pages 737-777, May.
    12. Fernando Rojas & Peter Wanke & Víctor Leiva & Mauricio Huerta & Carlos Martin-Barreiro, 2022. "Modeling Inventory Cost Savings and Supply Chain Success Factors: A Hybrid Robust Compromise Multi-Criteria Approach," Mathematics, MDPI, vol. 10(16), pages 1-18, August.
    13. Maghsoodi, Abtin Ijadi, 2023. "Cryptocurrency portfolio allocation using a novel hybrid and predictive big data decision support system," Omega, Elsevier, vol. 115(C).
    14. Hisham Alidrisi, 2021. "An Innovative Job Evaluation Approach Using the VIKOR Algorithm," JRFM, MDPI, vol. 14(6), pages 1-19, June.
    15. Büsing, Christina & Goetzmann, Kai-Simon & Matuschke, Jannik & Stiller, Sebastian, 2017. "Reference points and approximation algorithms in multicriteria discrete optimization," European Journal of Operational Research, Elsevier, vol. 260(3), pages 829-840.
    16. Abbas Keramati & Fatemeh Shapouri, 2016. "Multidimensional appraisal of customer relationship management: integrating balanced scorecard and multi criteria decision making approaches," Information Systems and e-Business Management, Springer, vol. 14(2), pages 217-251, May.
    17. Xiaodong Li & Haibo Kuang & Yan Hu, 2019. "Carbon Mitigation Strategies of Port Selection and Multimodal Transport Operations—A Case Study of Northeast China," Sustainability, MDPI, vol. 11(18), pages 1-17, September.
    18. Serafim Opricovic, 2009. "A Compromise Solution in Water Resources Planning," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 23(8), pages 1549-1561, June.
    19. Büyüközkan, Gülçin & Ruan, Da, 2008. "Evaluation of software development projects using a fuzzy multi-criteria decision approach," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 77(5), pages 464-475.
    20. María Pilar de la Cruz López & Juan José Cartelle Barros & Alfredo del Caño Gochi & Manuel Lara Coira, 2021. "New Approach for Managing Sustainability in Projects," Sustainability, MDPI, vol. 13(13), pages 1-27, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:chsofr:v:151:y:2021:i:c:s0960077921005907. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Thayer, Thomas R. (email available below). General contact details of provider: https://www.journals.elsevier.com/chaos-solitons-and-fractals .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.