Estimating random-intercept models on data streams

My bibliography Save this article

Estimating random-intercept models on data streams

Author

Listed:

Ippel, L.
Kaptein, M.C.
Vermunt, J.K.

Registered:

Abstract

Multilevel models are often used for the analysis of grouped data. Grouped data occur for instance when estimating the performance of pupils nested within schools or analyzing multiple observations nested within individuals. Currently, multilevel models are mostly fit to static datasets. However, recent technological advances in the measurement of social phenomena have led to data arriving in a continuous fashion (i.e., data streams). In these situations the data collection is never “finished”. Traditional methods of fitting multilevel models are ill-suited for the analysis of data streams because of their computational complexity. A novel algorithm for estimating random-intercept models is introduced. The Streaming EM Approximation (SEMA) algorithm is a fully-online (row-by-row) method enabling computationally-efficient estimation of random-intercept models. SEMA is tested in two simulation studies, and applied to longitudinal data regarding individuals’ happiness collected continuously using smart phones. SEMA shows competitive statistical performance to existing static approaches, but with large computational benefits. The introduction of this method allows researchers to broaden the scope of their research, by using data streams.

Suggested Citation

Ippel, L. & Kaptein, M.C. & Vermunt, J.K., 2016. "Estimating random-intercept models on data streams," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 169-182.

Handle: RePEc:eee:csdana:v:104:y:2016:i:c:p:169-182
DOI: 10.1016/j.csda.2016.06.008

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Steiner, P.M. & Hudec, M., 2007. "Classification of large data sets with mixture models via sufficient EM," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5416-5428, July.
Liu, Z. & Almhana, J. & Choulakian, V. & McGorman, R., 2006. "Online EM algorithm for mixture with application to internet traffic modeling," Computational Statistics & Data Analysis, Elsevier, vol. 50(4), pages 1052-1071, February.
William Browne & Harvey Goldstein, 2010. "MCMC Sampling for a Multilevel Model With Nonindependent Residuals Within and Between Cluster Units," Journal of Educational and Behavioral Statistics, , vol. 35(4), pages 453-473, August.
Donald Rubin & Dorothy Thayer, 1982. "EM algorithms for ML factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 47(1), pages 69-76, March.
Berlinet, A.F. & Roland, Ch., 2012. "Acceleration of the EM algorithm: P-EM versus epsilon algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 56(12), pages 4122-4137.
Olivier Cappé & Eric Moulines, 2009. "On‐line expectation–maximization algorithm for latent data models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 593-613, June.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

L. Ippel & M. C. Kaptein & J. K. Vermunt, 2019. "Estimating Multilevel Models on Data Streams," Psychometrika, Springer;The Psychometric Society, vol. 84(1), pages 41-64, March.
Ippel, L. & Kaptein, M.C. & Vermunt, J.K., 2019. "Online estimation of individual-level effects using streaming shrinkage factors," Computational Statistics & Data Analysis, Elsevier, vol. 137(C), pages 16-32.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Maire, Florian & Moulines, Eric & Lefebvre, Sidonie, 2017. "Online EM for functional data," Computational Statistics & Data Analysis, Elsevier, vol. 111(C), pages 27-47.
L. Ippel & M. C. Kaptein & J. K. Vermunt, 2019. "Estimating Multilevel Models on Data Streams," Psychometrika, Springer;The Psychometric Society, vol. 84(1), pages 41-64, March.
Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
Matteo Barigozzi & Matteo Luciani, 2019. "Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm," Papers 1910.03821, arXiv.org, revised Sep 2024.
- Matteo Barigozzi & Matteo Luciani, 2024. "Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm," Finance and Economics Discussion Series 2024-086, Board of Governors of the Federal Reserve System (U.S.).
Zirogiannis, Nikolaos & Tripodis, Yorghos, 2013. "A Generalized Dynamic Factor Model for Panel Data: Estimation with a Two-Cycle Conditional Expectation-Maximization Algorithm," Working Paper Series 142752, University of Massachusetts, Amherst, Department of Resource Economics.
Dorota Toczydlowska & Gareth W. Peters & Man Chung Fung & Pavel V. Shevchenko, 2017. "Stochastic Period and Cohort Effect State-Space Mortality Models Incorporating Demographic Factors via Probabilistic Robust Principal Components," Risks, MDPI, vol. 5(3), pages 1-77, July.
Jurgen A. Doornik, 2018. "Accelerated Estimation of Switching Algorithms: The Cointegrated VAR Model and Other Applications," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 45(2), pages 283-300, June.
- Jurgen A. Doornik, 2017. "Accelerated Estimation of Switching Algorithms: The Cointegrated VAR Model and Other Applications," Economics Papers 2017-W05, Economics Group, Nuffield College, University of Oxford.
Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Kiel Working Papers 1799, Kiel Institute for the World Economy (IfW Kiel).
- Pape, Markus & Aßmann, Christian & Boysen-Hogrefe, Jens, 2013. "The Directional Identification Problem in Bayesian Factor Analysis: An Ex-Post Approach," VfS Annual Conference 2013 (Duesseldorf): Competition Policy and Regulation in a Global Economic Order 79990, Verein für Socialpolitik / German Economic Association.
- Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Economics Working Papers 2012-11, Christian-Albrechts-University of Kiel, Department of Economics.
Chen, Derek H. C. & Gawande, Kishore, 2007. "Underlying dimensions of knowledge assessment : factor analysis of the knowledge assessment methodology data," Policy Research Working Paper Series 4216, The World Bank.
Zhou, Lin & Tang, Yayong, 2021. "Linearly preconditioned nonlinear conjugate gradient acceleration of the PX-EM algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 155(C).
Kim, Jiwhan & Nam, Changi & Ryu, Min Ho, 2020. "IPTV vs. emerging video services: Dilemma of telcos to upgrade the broadband," Telecommunications Policy, Elsevier, vol. 44(4).
Arno Onken & Jian K Liu & P P Chamanthi R Karunasekara & Ioannis Delis & Tim Gollisch & Stefano Panzeri, 2016. "Using Matrix and Tensor Factorizations for the Single-Trial Analysis of Population Spike Trains," PLOS Computational Biology, Public Library of Science, vol. 12(11), pages 1-46, November.
Jin, Shaobo & Moustaki, Irini & Yang-Wallentin, Fan, 2018. "Approximated penalized maximum likelihood for exploratory factor analysis: an orthogonal case," LSE Research Online Documents on Economics 88118, London School of Economics and Political Science, LSE Library.
Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2014. "Bayesian analysis of dynamic factor models: An ex-post approach towards the rotation problem," Kiel Working Papers 1902, Kiel Institute for the World Economy (IfW Kiel).
Matteo Barigozzi, 2023. "Asymptotic equivalence of Principal Components and Quasi Maximum Likelihood estimators in Large Approximate Factor Models," Papers 2307.09864, arXiv.org, revised Jun 2024.
John Tisak & William Meredith, 1989. "Exploratory longitudinal factor analysis in multiple populations," Psychometrika, Springer;The Psychometric Society, vol. 54(2), pages 261-281, June.
Gregory Camilli & Jean-Paul Fox, 2015. "An Aggregate IRT Procedure for Exploratory Factor Analysis," Journal of Educational and Behavioral Statistics, , vol. 40(4), pages 377-401, August.
Sato, Aki-Hiro, 2012. "Patterns of regional travel behavior: An analysis of Japanese hotel reservation data," International Review of Financial Analysis, Elsevier, vol. 23(C), pages 55-65.
Kohei Adachi & Nickolay T. Trendafilov, 2018. "Some Mathematical Properties of the Matrix Decomposition Solution in Factor Analysis," Psychometrika, Springer;The Psychometric Society, vol. 83(2), pages 407-424, June.
Anne Boomsma, 1985. "Nonconvergence, improper solutions, and starting values in lisrel maximum likelihood estimation," Psychometrika, Springer;The Psychometric Society, vol. 50(2), pages 229-242, June.

More about this item

Keywords

Data streams; Expectation–Maximization algorithm; Multilevel models; Online learning; Random-intercept model;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:104:y:2016:i:c:p:169-182. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Estimating random-intercept models on data streams

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data