IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v14y2022i5p143-d809518.html
   My bibliography  Save this article

Missing Data Imputation in the Internet of Things Sensor Networks

Author

Listed:
  • Benjamin Agbo

    (Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK)

  • Hussain Al-Aqrabi

    (Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK)

  • Richard Hill

    (Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK)

  • Tariq Alsboui

    (Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK)

Abstract

The Internet of Things (IoT) has had a tremendous impact on the evolution and adoption of information and communication technology. In the modern world, data are generated by individuals and collected automatically by physical objects that are fitted with electronics, sensors, and network connectivity. IoT sensor networks have become integral aspects of environmental monitoring systems. However, data collected from IoT sensor devices are usually incomplete due to various reasons such as sensor failures, drifts, network faults and various other operational issues. The presence of incomplete or missing values can substantially affect the calibration of on-field environmental sensors. The aim of this study is to identify efficient missing data imputation techniques that will ensure accurate calibration of sensors. To achieve this, we propose an efficient and robust imputation technique based on k -means clustering that is capable of selecting the best imputation technique for missing data imputation. We then evaluate the accuracy of our proposed technique against other techniques and test their effect on various calibration processes for data collected from on-field low-cost environmental sensors in urban air pollution monitoring stations. To test the efficiency of the imputation techniques, we simulated missing data rates at 10–40% and also considered missing values occurring over consecutive periods of time (1 day, 1 week and 1 month). Overall, our proposed BFMVI model recorded the best imputation accuracy (0.011758 RMSE for 10% missing data and 0.169418 RMSE at 40% missing data) compared to the other techniques ( k Nearest-Neighbour ( k NN), Regression Imputation (RI), Expectation Maximization (EM) and MissForest techniques) when evaluated using different performance indicators. Moreover, the results show a trade-off between imputation accuracy and computational complexity with benchmark techniques showing a low computational complexity at the expense of accuracy when compared with our proposed technique.

Suggested Citation

  • Benjamin Agbo & Hussain Al-Aqrabi & Richard Hill & Tariq Alsboui, 2022. "Missing Data Imputation in the Internet of Things Sensor Networks," Future Internet, MDPI, vol. 14(5), pages 1-16, May.
  • Handle: RePEc:gam:jftint:v:14:y:2022:i:5:p:143-:d:809518
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/14/5/143/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/14/5/143/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bengt Muthén & Kerby Shedden, 1999. "Finite Mixture Modeling with Mixture Outcomes Using the EM Algorithm," Biometrics, The International Biometric Society, vol. 55(2), pages 463-469, June.
    2. Jiajuan Liang & Peter Bentler, 2004. "An EM algorithm for fitting two-level structural equation models," Psychometrika, Springer;The Psychometric Society, vol. 69(1), pages 101-122, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marco Guerra & Francesca Bassi & José G. Dias, 2020. "A Multiple-Indicator Latent Growth Mixture Model to Track Courses with Low-Quality Teaching," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 147(2), pages 361-381, January.
    2. Michael Prendergast & David Huang & Yih-Ing Hser, 2008. "Patterns of Crime and Drug Use Trajectories in Relation to Treatment Initiation and 5-Year Outcomes," Evaluation Review, , vol. 32(1), pages 59-82, February.
    3. Silvia Bacci & Francesco Bartolucci & Giulia Bettin & Claudia Pigini, 2017. "A mixture growth model for migrants' remittances: An application to the German Socio-Economic Panel," Mo.Fi.R. Working Papers 145, Money and Finance Research group (Mo.Fi.R.) - Univ. Politecnica Marche - Dept. Economic and Social Sciences.
    4. Patrick Sturgis & Louise Sullivan, 2008. "Exploring social mobility with latent trajectory groups," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 171(1), pages 65-88, January.
    5. Getachew A. Dagne, 2016. "A growth mixture Tobit model: application to AIDS studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(7), pages 1174-1185, July.
    6. Bacci, Silvia & Bartolucci, Francesco & Pigini, Claudia & Signorelli, Marcello, 2014. "A finite mixture latent trajectory model for hirings and separations in the labor market," MPRA Paper 59730, University Library of Munich, Germany.
    7. Proust-Lima, Cécile & Joly, Pierre & Dartigues, Jean-François & Jacqmin-Gadda, Hélène, 2009. "Joint modelling of multivariate longitudinal outcomes and a time-to-event: A nonlinear latent class approach," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 1142-1154, February.
    8. Francesco Bartolucci & Ivonne Solis-Trapala, 2010. "Multidimensional Latent Markov Models in a Developmental Study of Inhibitory Control and Attentional Flexibility in Early Childhood," Psychometrika, Springer;The Psychometric Society, vol. 75(4), pages 725-743, December.
    9. Majid Ghasemy & Isabel Maria Rosa-Díaz & James Eric Gaskin, 2021. "The Roles of Supervisory Support and Involvement in Influencing Scientists’ Job Satisfaction to Ensure the Achievement of SDGs in Academic Organizations," SAGE Open, , vol. 11(3), pages 21582440211, July.
    10. Yuan Liu & Hongyun Liu, 2019. "Effects of Distance and Shape on the Estimation of the Piecewise Growth Mixture Model," Journal of Classification, Springer;The Classification Society, vol. 36(3), pages 659-677, October.
    11. Isabelle Archambault & Véronique Dupéré, 2017. "Joint trajectories of behavioral, affective, and cognitive engagement in elementary school," The Journal of Educational Research, Taylor & Francis Journals, vol. 110(2), pages 188-198, March.
    12. Pietro Lovaglio & Mario Mezzanzanica, 2013. "Classification of longitudinal career paths," Quality & Quantity: International Journal of Methodology, Springer, vol. 47(2), pages 989-1008, February.
    13. Zhou, Xingcai & Liu, Xinsheng, 2008. "The EM algorithm for the extended finite mixture of the factor analyzers model," Computational Statistics & Data Analysis, Elsevier, vol. 52(8), pages 3939-3953, April.
    14. Leila Amiri & Mojtaba Khazaei & Mojtaba Ganjali, 2018. "A mixture latent variable model for modeling mixed data in heterogeneous populations and its applications," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 102(1), pages 95-115, January.
    15. Joanna F. Dipnall & Belinda J. Gabbe & Warwick J. Teague & Ben Beck, 2020. "Identifying Homogeneous Patterns of Injury in Paediatric Trauma Patients to Improve Risk-Adjusted Models of Mortality and Functional Outcomes," IJERPH, MDPI, vol. 17(3), pages 1-20, January.
    16. Lu, Xiaosun & Huang, Yangxin & Zhu, Yiliang, 2016. "Finite mixture of nonlinear mixed-effects joint models in the presence of missing and mismeasured covariate, with application to AIDS studies," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 119-130.
    17. Jost Reinecke & Daniel Seddig, 2011. "Growth mixture models in longitudinal research," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 95(4), pages 415-434, December.
    18. Yangxin Huang & Xiaosun Lu & Jiaqing Chen & Juan Liang & Miriam Zangmeister, 2018. "Joint model-based clustering of nonlinear longitudinal trajectories and associated time-to-event data analysis, linked by latent class membership: with application to AIDS clinical studies," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 24(4), pages 699-718, October.
    19. David Aristei & Silvia Bacci & Francesco Bartolucci & Silvia Pandolfi, 2021. "A bivariate finite mixture growth model with selection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(3), pages 759-793, September.
    20. Jonathan Schweig, 2014. "Multilevel Factor Analysis by Model Segregation," Journal of Educational and Behavioral Statistics, , vol. 39(5), pages 394-422, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:14:y:2022:i:5:p:143-:d:809518. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.