IDEAS home Printed from https://ideas.repec.org/a/spr/psycho/v84y2019i1d10.1007_s11336-018-09656-z.html
   My bibliography  Save this article

Estimating Multilevel Models on Data Streams

Author

Listed:
  • L. Ippel

    (Maastricht University)

  • M. C. Kaptein

    (Tilburg University)

  • J. K. Vermunt

    (Tilburg University)

Abstract

Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or “row-by-row”). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.

Suggested Citation

  • L. Ippel & M. C. Kaptein & J. K. Vermunt, 2019. "Estimating Multilevel Models on Data Streams," Psychometrika, Springer;The Psychometric Society, vol. 84(1), pages 41-64, March.
  • Handle: RePEc:spr:psycho:v:84:y:2019:i:1:d:10.1007_s11336-018-09656-z
    DOI: 10.1007/s11336-018-09656-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11336-018-09656-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11336-018-09656-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Harvey Goldstein & Roderick McDonald, 1988. "A general model for the analysis of multilevel data," Psychometrika, Springer;The Psychometric Society, vol. 53(4), pages 455-467, December.
    2. Liu, Z. & Almhana, J. & Choulakian, V. & McGorman, R., 2006. "Online EM algorithm for mixture with application to internet traffic modeling," Computational Statistics & Data Analysis, Elsevier, vol. 50(4), pages 1052-1071, February.
    3. Kooreman, Peter & Scherpenzeel, Annette, 2014. "High frequency body mass measurement, feedback, and health behaviors," Economics & Human Biology, Elsevier, vol. 14(C), pages 141-153.
    4. Ippel, L. & Kaptein, M.C. & Vermunt, J.K., 2016. "Estimating random-intercept models on data streams," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 169-182.
    5. Olivier Cappé & Eric Moulines, 2009. "On‐line expectation–maximization algorithm for latent data models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 593-613, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maire, Florian & Moulines, Eric & Lefebvre, Sidonie, 2017. "Online EM for functional data," Computational Statistics & Data Analysis, Elsevier, vol. 111(C), pages 27-47.
    2. Ippel, L. & Kaptein, M.C. & Vermunt, J.K., 2016. "Estimating random-intercept models on data streams," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 169-182.
    3. N. Longford & B. Muthén, 1992. "Factor analysis for clustered observations," Psychometrika, Springer;The Psychometric Society, vol. 57(4), pages 581-597, December.
    4. Sato, Aki-Hiro, 2012. "Patterns of regional travel behavior: An analysis of Japanese hotel reservation data," International Review of Financial Analysis, Elsevier, vol. 23(C), pages 55-65.
    5. Shuji Shinohara & Nobuhito Manome & Kouta Suzuki & Ung-il Chung & Tatsuji Takahashi & Hiroshi Okamoto & Yukio Pegio Gunji & Yoshihiro Nakajima & Shunji Mitsuyoshi, 2020. "A new method of Bayesian causal inference in non-stationary environments," PLOS ONE, Public Library of Science, vol. 15(5), pages 1-22, May.
    6. Wai-Yin Poon & Hai-Bin Wang, 2010. "Analysis of a Two-Level Structural Equation Model With Missing Data," Sociological Methods & Research, , vol. 39(1), pages 25-55, August.
    7. Asim Ansari & Kamel Jedidi & Sharan Jagpal, 2000. "A Hierarchical Bayesian Methodology for Treating Heterogeneity in Structural Equation Models," Marketing Science, INFORMS, vol. 19(4), pages 328-347, August.
    8. Roderick McDonald, 1993. "A general model for two-level data with responses missing at random," Psychometrika, Springer;The Psychometric Society, vol. 58(4), pages 575-585, December.
    9. Nicholas J. Rockwood, 2020. "Maximum Likelihood Estimation of Multilevel Structural Equation Models with Random Slopes for Latent Covariates," Psychometrika, Springer;The Psychometric Society, vol. 85(2), pages 275-300, June.
    10. Ke-Hai Yuan & Kentaro Hayashi, 2005. "On muthén’s maximum likelihood for two-level covariance structure models," Psychometrika, Springer;The Psychometric Society, vol. 70(1), pages 147-167, March.
    11. Sik-Yum, Lee & Wai-Yin, Poon, 1995. "Estimation of factor scores in a two-level confirmatory factor analysis model," Computational Statistics & Data Analysis, Elsevier, vol. 20(3), pages 275-284, September.
    12. Robert Mislevy, 1991. "Randomization-based inference about latent variables from complex samples," Psychometrika, Springer;The Psychometric Society, vol. 56(2), pages 177-196, June.
    13. Pietro Lovaglio & Giorgio Vittadini, 2013. "Multilevel dimensionality-reduction methods," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 22(2), pages 183-207, June.
    14. Anna Ruelens & Bart Meuleman & Ides Nicaise, 2018. "Examining Measurement Isomorphism of Multilevel Constructs: The Case of Political Trust," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 140(3), pages 907-927, December.
    15. Ippel, L. & Kaptein, M.C. & Vermunt, J.K., 2019. "Online estimation of individual-level effects using streaming shrinkage factors," Computational Statistics & Data Analysis, Elsevier, vol. 137(C), pages 16-32.
    16. Jun Sun, 2020. "Ubiquitous Computing Capabilities and User-System Interaction Readiness: An Activity Perspective," Information Systems Frontiers, Springer, vol. 22(1), pages 259-271, February.
    17. Yahia S El-Horbaty & Eman M Hanafy, 2018. "Some Estimation Methods and Their Assessment in Multilevel Models: A Review," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 5(3), pages 69-76, February.
    18. Johannes Bill & Samuel J. Gershman & Jan Drugowitsch, 2022. "Visual motion perception as online hierarchical inference," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    19. Ojeda, M. M. & Juarez-Cerrillo, S. F., 1996. "Biplot display for diagnostic in a two-level regression model for growth curve analysis," Computational Statistics & Data Analysis, Elsevier, vol. 22(6), pages 583-597, October.
    20. Donna Henderson & Gerton Lunter, 2020. "Efficient inference in state-space models through adaptive learning in online Monte Carlo expectation maximization," Computational Statistics, Springer, vol. 35(3), pages 1319-1344, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:psycho:v:84:y:2019:i:1:d:10.1007_s11336-018-09656-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.