IDEAS home Printed from https://ideas.repec.org/a/spr/stmapp/v28y2019i2d10.1007_s10260-018-00435-9.html
   My bibliography  Save this article

Reconstructing missing data sequences in multivariate time series: an application to environmental data

Author

Listed:
  • Maria Lucia Parrella

    (Università of Salerno)

  • Giuseppina Albano

    (Università of Salerno)

  • Michele La Rocca

    (Università of Salerno)

  • Cira Perna

    (Università of Salerno)

Abstract

Missing data arise in many statistical analyses, due to faults in data acquisition, and can have a significant effect on the conclusions that can be drawn from the data. In environmental data, for example, a standard approach usually adopted by the Environmental Protection Agencies to handle missing values is by deleting those observations with incomplete information from the study, obtaining a massive underestimation of many indexes usually used for evaluating air quality. In multivariate time series, moreover, it may happen that not only isolated values but also long sequences of some of the time series’ components may miss. In such cases, it is quite impossible to reconstruct the missing sequences basing on the serial dependence structure alone. In this work, we propose a new procedure that aims to reconstruct the missing sequences by exploiting the spatial correlation and the serial correlation of the multivariate time series, simultaneously. The proposed procedure is based on a spatial-dynamic model and imputes the missing values in the time series basing on a linear combination of the neighbor contemporary observations and their lagged values. It is specifically oriented to spatio-temporal data, although it is general enough to be applied to generic stationary multivariate time-series. In this paper, the procedure has been applied to the pollution data, where the problem of missing sequences is of serious concern, with remarkably satisfactory performance.

Suggested Citation

  • Maria Lucia Parrella & Giuseppina Albano & Michele La Rocca & Cira Perna, 2019. "Reconstructing missing data sequences in multivariate time series: an application to environmental data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(2), pages 359-383, June.
  • Handle: RePEc:spr:stmapp:v:28:y:2019:i:2:d:10.1007_s10260-018-00435-9
    DOI: 10.1007/s10260-018-00435-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10260-018-00435-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10260-018-00435-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Honaker, James & King, Gary & Blackwell, Matthew, 2011. "Amelia II: A Program for Missing Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i07).
    3. Crescenza Calculli & Alessandro Fassò & Francesco Finazzi & Alessio Pollice & Annarita Turnone, 2015. "Maximum likelihood estimation of the multivariate hidden dynamic geostatistical model with application to air quality in Apulia, Italy," Environmetrics, John Wiley & Sons, Ltd., vol. 26(6), pages 406-417, September.
    4. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    5. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    6. Dou, Baojun & Parrella, Maria Lucia & Yao, Qiwei, 2016. "Generalized Yule–Walker estimation for spatio-temporal models with unknown diagonal coefficients," Journal of Econometrics, Elsevier, vol. 194(2), pages 369-382.
    7. Dou, Baojun & Parrella, Maria Lucia & Yao, Qiwei, 2016. "Generalized Yule–Walker estimation for spatio-temporal models with unknown diagonal coefficients," LSE Research Online Documents on Economics 67151, London School of Economics and Political Science, LSE Library.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Maria Lucia Parrella & Giuseppina Albano & Cira Perna & Michele La Rocca, 2021. "Bootstrap joint prediction regions for sequences of missing values in spatio-temporal datasets," Computational Statistics, Springer, vol. 36(4), pages 2917-2938, December.
    2. Yohan Kim & Scott Kelly & Deepu Krishnan & Jay Falletta & Kerryn Wilmot, 2022. "Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials," IJERPH, MDPI, vol. 19(3), pages 1-17, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nicholas Tierney & Dianne Cook, 2018. "Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations," Monash Econometrics and Business Statistics Working Papers 14/18, Monash University, Department of Econometrics and Business Statistics.
    2. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    3. Maria Lucia Parrella & Giuseppina Albano & Cira Perna & Michele La Rocca, 2021. "Bootstrap joint prediction regions for sequences of missing values in spatio-temporal datasets," Computational Statistics, Springer, vol. 36(4), pages 2917-2938, December.
    4. Lara Lopez & Fernando L. Vázquez & Ángela J. Torres & Patricia Otero & Vanessa Blanco & Olga Díaz & Mario Páramo, 2020. "Long-Term Effects of a Cognitive Behavioral Conference Call Intervention on Depression in Non-Professional Caregivers," IJERPH, MDPI, vol. 17(22), pages 1-24, November.
    5. Nicklas Pettersson, 2013. "Bias reduction of finite population imputation by kernel methods," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 14(1), pages 139-160, March.
    6. Jiang, Wei & Josse, Julie & Lavielle, Marc, 2020. "Logistic regression with missing covariates—Parameter estimation, model selection and prediction within a joint-modeling framework," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
    7. Michele Aquaro & Natalia Bailey & M. Hashem Pesaran, 2021. "Estimation and inference for spatial models with heterogeneous coefficients: An application to US house prices," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(1), pages 18-44, January.
    8. Cheng, Xiaoyue & Cook, Dianne & Hofmann, Heike, 2015. "Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i06).
    9. Xuan Liang & Jiti Gao & Xiaodong Gong, 2022. "Semiparametric Spatial Autoregressive Panel Data Model with Fixed Effects and Time-Varying Coefficients," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(4), pages 1784-1802, October.
    10. Ahmad R. Alsaber & Jiazhu Pan & Adeeba Al-Hurban, 2021. "Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)," IJERPH, MDPI, vol. 18(3), pages 1-25, February.
    11. Nengsih Titin Agustin & Bertrand Frédéric & Maumy-Bertrand Myriam & Meyer Nicolas, 2019. "Determining the number of components in PLS regression on incomplete data set," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(6), pages 1-28, December.
    12. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    13. Hanno Reuvers & Etienne Wijler, 2021. "Sparse Generalized Yule-Walker Estimation for Large Spatio-temporal Autoregressions with an Application to NO2 Satellite Data," Papers 2108.02864, arXiv.org, revised Dec 2021.
    14. Schalk Burger & Searle Silverman & Gary van Vuuren, 2018. "Deriving Correlation Matrices for Missing Financial Time-Series Data," International Journal of Economics and Finance, Canadian Center of Science and Education, vol. 10(10), pages 105-105, October.
    15. World Bank & Organisation for Economic Co-operation and Development, 2017. "A Step Ahead," World Bank Publications - Books, The World Bank Group, number 27527.
    16. Adel Bosch & Steven F. Koch, 2021. "Individual and Household Debt: Does Imputation Choice Matter?," Working Papers 202141, University of Pretoria, Department of Economics.
    17. Junyung Ji & Jiwoo Kim & Younghoon Kim, 2024. "Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response," Future Internet, MDPI, vol. 16(10), pages 1-19, September.
    18. Gao, Zhaoxing & Ma, Yingying & Wang, Hansheng & Yao, Qiwei, 2019. "Banded spatio-temporal autoregressions," Journal of Econometrics, Elsevier, vol. 208(1), pages 211-230.
    19. Parashmoni Borah & Suhasini Hazarika & Amit Prakash, 2022. "Assessing the state of homogeneity, variability and trends in the rainfall time series from 1969 to 2017 and its significance for groundwater in north-east India," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 111(1), pages 585-617, March.
    20. Henry Webel & Lili Niu & Annelaura Bach Nielsen & Marie Locard-Paulet & Matthias Mann & Lars Juhl Jensen & Simon Rasmussen, 2024. "Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning," Nature Communications, Nature, vol. 15(1), pages 1-15, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stmapp:v:28:y:2019:i:2:d:10.1007_s10260-018-00435-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.