IDEAS home Printed from https://ideas.repec.org/a/eee/chsofr/v151y2021ics0960077921005907.html
   My bibliography  Save this article

Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation

Author

Listed:
  • Alamoodi, A.H.
  • Zaidan, B.B.
  • Zaidan, A.A.
  • Albahri, O.S.
  • Chen, Juliana
  • Chyad, M.A.
  • Garfan, Salem
  • Aleesa, A.M.

Abstract

Missing data is a common problem in real-world data sets and it is amongst the most complex topics in computer science and many other research domains. The common ways to cope with missing values are either by elimination or imputation depending of the volume of the missing data and its distribution nature. It becomes imperative to come up with new imputation approaches along with efficient algorithms. Though most existing imputation methods focus on a moderate amount of missing data, imputation for high missing rates over 80% is still important but challenging. Even with the existence of some works in addressing high missing volume issue, they mostly rely on imputing reference dataset (Complete Datasets for evaluation) after they create artificial missing values and impute it to measure the accuracy of their proposed techniques. So far, the option of imputing high proportions of missing values with no reference comparison dataset (Original Dataset with highly missing values) have been often ignored or overlooked. Therefore, we propose a missing data imputation approach for high volumes of missing values with no reference comparison dataset. The approach makes use of pre-processing measures and breaking the dataset into small continuous non-missing portions then using Multi Criteria Decision-making analysis to select a portion of data which is representative of the entire broken datasets. This portion helps to create reference comparisons and expands the missing dataset through artificial missing-making procedures with different percentages and imputation using different machine learning techniques. This study conducted two experiments using BMI datasets with more than 80% of missing values, derived from the National Child Development Centre (NCDRC) at Sultan Idris Education University (UPSI), Malaysia. The results show that our approach capability in reconstructing datasets with huge missing values.

Suggested Citation

  • Alamoodi, A.H. & Zaidan, B.B. & Zaidan, A.A. & Albahri, O.S. & Chen, Juliana & Chyad, M.A. & Garfan, Salem & Aleesa, A.M., 2021. "Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation," Chaos, Solitons & Fractals, Elsevier, vol. 151(C).
  • Handle: RePEc:eee:chsofr:v:151:y:2021:i:c:s0960077921005907
    DOI: 10.1016/j.chaos.2021.111236
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0960077921005907
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.chaos.2021.111236?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sunny, Bindu S. & Elze, Markus & Chihana, Menard & Gondwe, Levie & Crampin, Amelia C. & Munkhondya, Masoyaona & Kondowe, Scotch & Glynn, Judith R., 2017. "Failing to progress or progressing to fail? Age-for-grade heterogeneity and grade repetition in primary schools in Karonga district, northern Malawi," International Journal of Educational Development, Elsevier, vol. 52(C), pages 68-80.
    2. Nagy, Krisztina, 2020. "Term structure estimation with missing data: Application for emerging markets," The Quarterly Review of Economics and Finance, Elsevier, vol. 75(C), pages 347-360.
    3. Lee, Shawna J. & Altschul, Inna & Gershoff, Elizabeth T., 2015. "Wait until your father gets home? Mother's and fathers’ spanking and development of child aggression," Children and Youth Services Review, Elsevier, vol. 52(C), pages 158-166.
    4. Lê, Félice & Diez Roux, Ana & Morgenstern, Hal, 2013. "Effects of child and adolescent health on educational progress," Social Science & Medicine, Elsevier, vol. 76(C), pages 57-66.
    5. Opricovic, Serafim & Tzeng, Gwo-Hshiung, 2004. "Compromise solution by MCDM methods: A comparative analysis of VIKOR and TOPSIS," European Journal of Operational Research, Elsevier, vol. 156(2), pages 445-455, July.
    6. Yu Jiang & Sai Chen & Daniel McGuire & Fang Chen & Mengzhen Liu & William G Iacono & John K Hewitt & John E Hokanson & Kenneth Krauter & Markku Laakso & Kevin W Li & Sharon M Lutz & Matthew McGue & An, 2018. "Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes," PLOS Genetics, Public Library of Science, vol. 14(7), pages 1-19, July.
    7. Kremer, Kristen P. & Flower, Andrea & Huang, Jin & Vaughn, Michael G., 2016. "Behavior problems and children's academic achievement: A test of growth-curve models with gender and racial differences," Children and Youth Services Review, Elsevier, vol. 67(C), pages 95-104.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gürler, Hasan Emin & Özçalıcı, Mehmet & Pamucar, Dragan, 2024. "Determining criteria weights with genetic algorithms for multi-criteria decision making methods: The case of logistics performance index rankings of European Union countries," Socio-Economic Planning Sciences, Elsevier, vol. 91(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi Peng, 2015. "Regional earthquake vulnerability assessment using a combination of MCDM methods," Annals of Operations Research, Springer, vol. 234(1), pages 95-110, November.
    2. Zheng, Guozhong & Wang, Xiao, 2020. "The comprehensive evaluation of renewable energy system schemes in tourist resorts based on VIKOR method," Energy, Elsevier, vol. 193(C).
    3. Milad Zamanifar & Seyed Mohammad Seyedhoseyni, 2017. "Recovery planning model for roadways network after natural hazards," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 87(2), pages 699-716, June.
    4. Pedro Ponce & Citlaly Pérez & Aminah Robinson Fayek & Arturo Molina, 2022. "Solar Energy Implementation in Manufacturing Industry Using Multi-Criteria Decision-Making Fuzzy TOPSIS and S4 Framework," Energies, MDPI, vol. 15(23), pages 1-19, November.
    5. Wenyao Niu & Yuan Rong & Liying Yu & Lu Huang, 2022. "A Novel Hybrid Group Decision Making Approach Based on EDAS and Regret Theory under a Fermatean Cubic Fuzzy Environment," Mathematics, MDPI, vol. 10(17), pages 1-30, August.
    6. Hisham Alidrisi, 2021. "An Innovative Job Evaluation Approach Using the VIKOR Algorithm," JRFM, MDPI, vol. 14(6), pages 1-19, June.
    7. Abbas Keramati & Fatemeh Shapouri, 2016. "Multidimensional appraisal of customer relationship management: integrating balanced scorecard and multi criteria decision making approaches," Information Systems and e-Business Management, Springer, vol. 14(2), pages 217-251, May.
    8. Serafim Opricovic, 2009. "A Compromise Solution in Water Resources Planning," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 23(8), pages 1549-1561, June.
    9. María Pilar de la Cruz López & Juan José Cartelle Barros & Alfredo del Caño Gochi & Manuel Lara Coira, 2021. "New Approach for Managing Sustainability in Projects," Sustainability, MDPI, vol. 13(13), pages 1-27, June.
    10. Zheng Yuan & Baohua Wen & Cheng He & Jin Zhou & Zhonghua Zhou & Feng Xu, 2022. "Application of Multi-Criteria Decision-Making Analysis to Rural Spatial Sustainability Evaluation: A Systematic Review," IJERPH, MDPI, vol. 19(11), pages 1-31, May.
    11. Lupo, Toni, 2015. "Fuzzy ServPerf model combined with ELECTRE III to comparatively evaluate service quality of international airports in Sicily," Journal of Air Transport Management, Elsevier, vol. 42(C), pages 249-259.
    12. Villacreses, Geovanna & Gaona, Gabriel & Martínez-Gómez, Javier & Jijón, Diego Juan, 2017. "Wind farms suitability location using geographical information system (GIS), based on multi-criteria decision making (MCDM) methods: The case of continental Ecuador," Renewable Energy, Elsevier, vol. 109(C), pages 275-286.
    13. Sirirat Sae Lim & Hong Ngoc Nguyen & Chia-Li Lin, 2022. "Exploring the Development Strategies of Science Parks Using the Hybrid MCDM Approach," Sustainability, MDPI, vol. 14(7), pages 1-29, April.
    14. Manuel Casal-Guisande & Alberto Comesaña-Campos & Alejandro Pereira & José-Benito Bouza-Rodríguez & Jorge Cerqueiro-Pequeño, 2022. "A Decision-Making Methodology Based on Expert Systems Applied to Machining Tools Condition Monitoring," Mathematics, MDPI, vol. 10(3), pages 1-30, February.
    15. Zeynep Gamze Mert & Gülşen Akman, 2011. "The Profile of the Organized Industrial Zones in Kocaeli/TURKEY," ERSA conference papers ersa11p1137, European Regional Science Association.
    16. Olga A. Shvetsova & Elena A. Rodionova & Michael Z. Epstein, 2018. "Evaluation of investment projects under uncertainty: multi-criteria approach using interval data," Post-Print hal-01858557, HAL.
    17. Kuang-Hua Hu & Fu-Hsiang Chen & Gwo-Hshiung Tzeng, 2016. "Evaluating the Improvement of Sustainability of Sports Industry Policy Based on MADM," Sustainability, MDPI, vol. 8(7), pages 1-21, June.
    18. Haji Vahabzadeh, Ali & Asiaei, Arash & Zailani, Suhaiza, 2015. "Reprint of “Green decision-making model in reverse logistics using FUZZY-VIKOR method”," Resources, Conservation & Recycling, Elsevier, vol. 104(PB), pages 334-347.
    19. Chunguang Bai & Joseph Sarkis, 2013. "Green information technology strategic justification and evaluation," Information Systems Frontiers, Springer, vol. 15(5), pages 831-847, November.
    20. Jing Wang & Jian-Qiang Wang & Hong-Yu Zhang & Xiao-Hong Chen, 2017. "Distance-Based Multi-Criteria Group Decision-Making Approaches with Multi-Hesitant Fuzzy Linguistic Information," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(04), pages 1069-1099, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:chsofr:v:151:y:2021:i:c:s0960077921005907. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Thayer, Thomas R. (email available below). General contact details of provider: https://www.journals.elsevier.com/chaos-solitons-and-fractals .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.