IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v14y2022i5p143-d809518.html
   My bibliography  Save this article

Missing Data Imputation in the Internet of Things Sensor Networks

Author

Listed:
  • Benjamin Agbo

    (Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK)

  • Hussain Al-Aqrabi

    (Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK)

  • Richard Hill

    (Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK)

  • Tariq Alsboui

    (Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK)

Abstract

The Internet of Things (IoT) has had a tremendous impact on the evolution and adoption of information and communication technology. In the modern world, data are generated by individuals and collected automatically by physical objects that are fitted with electronics, sensors, and network connectivity. IoT sensor networks have become integral aspects of environmental monitoring systems. However, data collected from IoT sensor devices are usually incomplete due to various reasons such as sensor failures, drifts, network faults and various other operational issues. The presence of incomplete or missing values can substantially affect the calibration of on-field environmental sensors. The aim of this study is to identify efficient missing data imputation techniques that will ensure accurate calibration of sensors. To achieve this, we propose an efficient and robust imputation technique based on k -means clustering that is capable of selecting the best imputation technique for missing data imputation. We then evaluate the accuracy of our proposed technique against other techniques and test their effect on various calibration processes for data collected from on-field low-cost environmental sensors in urban air pollution monitoring stations. To test the efficiency of the imputation techniques, we simulated missing data rates at 10–40% and also considered missing values occurring over consecutive periods of time (1 day, 1 week and 1 month). Overall, our proposed BFMVI model recorded the best imputation accuracy (0.011758 RMSE for 10% missing data and 0.169418 RMSE at 40% missing data) compared to the other techniques ( k Nearest-Neighbour ( k NN), Regression Imputation (RI), Expectation Maximization (EM) and MissForest techniques) when evaluated using different performance indicators. Moreover, the results show a trade-off between imputation accuracy and computational complexity with benchmark techniques showing a low computational complexity at the expense of accuracy when compared with our proposed technique.

Suggested Citation

  • Benjamin Agbo & Hussain Al-Aqrabi & Richard Hill & Tariq Alsboui, 2022. "Missing Data Imputation in the Internet of Things Sensor Networks," Future Internet, MDPI, vol. 14(5), pages 1-16, May.
  • Handle: RePEc:gam:jftint:v:14:y:2022:i:5:p:143-:d:809518
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/14/5/143/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/14/5/143/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bengt Muthén & Kerby Shedden, 1999. "Finite Mixture Modeling with Mixture Outcomes Using the EM Algorithm," Biometrics, The International Biometric Society, vol. 55(2), pages 463-469, June.
    2. Jiajuan Liang & Peter Bentler, 2004. "An EM algorithm for fitting two-level structural equation models," Psychometrika, Springer;The Psychometric Society, vol. 69(1), pages 101-122, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Prendergast & David Huang & Yih-Ing Hser, 2008. "Patterns of Crime and Drug Use Trajectories in Relation to Treatment Initiation and 5-Year Outcomes," Evaluation Review, , vol. 32(1), pages 59-82, February.
    2. Getachew A. Dagne, 2016. "A growth mixture Tobit model: application to AIDS studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(7), pages 1174-1185, July.
    3. Bacci, Silvia & Bartolucci, Francesco & Pigini, Claudia & Signorelli, Marcello, 2014. "A finite mixture latent trajectory model for hirings and separations in the labor market," MPRA Paper 59730, University Library of Munich, Germany.
    4. Francesco Bartolucci & Ivonne Solis-Trapala, 2010. "Multidimensional Latent Markov Models in a Developmental Study of Inhibitory Control and Attentional Flexibility in Early Childhood," Psychometrika, Springer;The Psychometric Society, vol. 75(4), pages 725-743, December.
    5. Yuan Liu & Hongyun Liu, 2019. "Effects of Distance and Shape on the Estimation of the Piecewise Growth Mixture Model," Journal of Classification, Springer;The Classification Society, vol. 36(3), pages 659-677, October.
    6. Pietro Lovaglio & Mario Mezzanzanica, 2013. "Classification of longitudinal career paths," Quality & Quantity: International Journal of Methodology, Springer, vol. 47(2), pages 989-1008, February.
    7. Leila Amiri & Mojtaba Khazaei & Mojtaba Ganjali, 2018. "A mixture latent variable model for modeling mixed data in heterogeneous populations and its applications," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 102(1), pages 95-115, January.
    8. Joanna F. Dipnall & Belinda J. Gabbe & Warwick J. Teague & Ben Beck, 2020. "Identifying Homogeneous Patterns of Injury in Paediatric Trauma Patients to Improve Risk-Adjusted Models of Mortality and Functional Outcomes," IJERPH, MDPI, vol. 17(3), pages 1-20, January.
    9. Lu, Xiaosun & Huang, Yangxin & Zhu, Yiliang, 2016. "Finite mixture of nonlinear mixed-effects joint models in the presence of missing and mismeasured covariate, with application to AIDS studies," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 119-130.
    10. Yangxin Huang & Xiaosun Lu & Jiaqing Chen & Juan Liang & Miriam Zangmeister, 2018. "Joint model-based clustering of nonlinear longitudinal trajectories and associated time-to-event data analysis, linked by latent class membership: with application to AIDS clinical studies," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 24(4), pages 699-718, October.
    11. Jonathan Schweig, 2014. "Multilevel Factor Analysis by Model Segregation," Journal of Educational and Behavioral Statistics, , vol. 39(5), pages 394-422, October.
    12. Karb, Rebecca A. & Elliott, Michael R. & Dowd, Jennifer B. & Morenoff, Jeffrey D., 2012. "Neighborhood-level stressors, social support, and diurnal patterns of cortisol: The Chicago Community Adult Health Study," Social Science & Medicine, Elsevier, vol. 75(6), pages 1038-1047.
    13. Erin S. Rogers & Elizabeth Vargas & Christina N. Wysota & Scott E. Sherman, 2022. "Latent Heterogeneity in the Impact of Financial Coaching on Delay Discounting among Low-Income Smokers: A Secondary Analysis of a Randomized Controlled Trial," IJERPH, MDPI, vol. 19(5), pages 1-11, February.
    14. Juan Shen & Xuming He, 2015. "Inference for Subgroup Analysis With a Structured Logistic-Normal Mixture Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 303-312, March.
    15. Silvia Cagnone & Cinzia Viroli, 2014. "A factor mixture model for analyzing heterogeneity and cognitive structure of dementia," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 98(1), pages 1-20, January.
    16. Nicholas J. Rockwood, 2020. "Maximum Likelihood Estimation of Multilevel Structural Equation Models with Random Slopes for Latent Covariates," Psychometrika, Springer;The Psychometric Society, vol. 85(2), pages 275-300, June.
    17. Chung, Hwan & Chang, Hsiu-Ching, 2012. "Bayesian approaches to the model selection problem in the analysis of latent stage-sequential process," Computational Statistics & Data Analysis, Elsevier, vol. 56(12), pages 4097-4110.
    18. Ke-Hai Yuan & Kentaro Hayashi, 2005. "On muthén’s maximum likelihood for two-level covariance structure models," Psychometrika, Springer;The Psychometric Society, vol. 70(1), pages 147-167, March.
    19. Shen-Ming Lee & Phuoc-Loc Tran & Truong-Nhat Le & Chin-Shang Li, 2023. "Prediction of a Sensitive Feature under Indirect Questioning via Warner’s Randomized Response Technique and Latent Class Model," Mathematics, MDPI, vol. 11(2), pages 1-21, January.
    20. Reising, Kim & Ttofi, Maria M. & Farrington, David P. & Piquero, Alex R., 2019. "Depression and anxiety outcomes of offending trajectories: A systematic review of prospective longitudinal studies," Journal of Criminal Justice, Elsevier, vol. 62(C), pages 3-15.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:14:y:2022:i:5:p:143-:d:809518. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.