IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v137y2019icp16-32.html
   My bibliography  Save this article

Online estimation of individual-level effects using streaming shrinkage factors

Author

Listed:
  • Ippel, L.
  • Kaptein, M.C.
  • Vermunt, J.K.

Abstract

It has become increasingly easy to collect data from individuals over long periods of time. Examples include smart-phone applications used to track movements with GPS, web-log data tracking individuals’ browsing behavior, and longitudinal (cohort) studies where many individuals are monitored over an extensive period of time. All these datasets cover a large number of individuals and collect data on the same individuals repeatedly, causing a nested structure in the data. Moreover, the data collection is never ‘finished’ as new data keep streaming in. It is well known that predictions that use the data of the individual whose individual-level effect is predicted in combination with the data of all the other individuals, are better in terms of squared error than those that just use the individual mean. However, when data are both nested and streaming, and the outcome variable is binary, computing these individual-level predictions can be computationally challenging. Five computationally-efficient estimation methods which do not revise “old” data but do account for the nested data structure are developed and evaluated. The methods are based on existing shrinkage factors. A shrinkage factor is used to predict an individual-level effect (i.e., the probability to score a 1), by weighing the individual mean and the mean over all data points. The performance of the existing and newly developed shrinkage factors are compared in a simulation study. While the existing methods differ in their prediction accuracy, the differences in accuracy between the novel shrinkage factors and the existing methods are extremely small. The novel methods are however computationally much more appealing.

Suggested Citation

  • Ippel, L. & Kaptein, M.C. & Vermunt, J.K., 2019. "Online estimation of individual-level effects using streaming shrinkage factors," Computational Statistics & Data Analysis, Elsevier, vol. 137(C), pages 16-32.
  • Handle: RePEc:eee:csdana:v:137:y:2019:i:c:p:16-32
    DOI: 10.1016/j.csda.2019.01.010
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947319300246
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2019.01.010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ippel, L. & Kaptein, M.C. & Vermunt, J.K., 2016. "Estimating random-intercept models on data streams," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 169-182.
    2. Bates, Douglas & Mächler, Martin & Bolker, Ben & Walker, Steve, 2015. "Fitting Linear Mixed-Effects Models Using lme4," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i01).
    3. Sophia Rabe-Hesketh & Anders Skrondal & Andrew Pickles, 2002. "Reliable estimation of generalized linear mixed models using adaptive quadrature," Stata Journal, StataCorp LP, vol. 2(1), pages 1-21, February.
    4. Philippe Pébay & Timothy B. Terriberry & Hemanth Kolla & Janine Bennett, 2016. "Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights," Computational Statistics, Springer, vol. 31(4), pages 1305-1325, December.
    5. Wang, Li-Yu & Park, Cheolwoo & Yeon, Kyupil & Choi, Hosik, 2017. "Tracking concept drift using a constrained penalized regression combiner," Computational Statistics & Data Analysis, Elsevier, vol. 108(C), pages 52-69.
    6. Mirjam Moerbeek & Gerard J. P. Breukelen & Martijn P. F. Berger, 2003. "A Comparison of Estimation Methods for Multilevel Logistic Models," Computational Statistics, Springer, vol. 18(1), pages 19-37, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Fabio Vieira & Roger Leenders & Joris Mulder, 2024. "Fast meta-analytic approximations for relational event models: applications to data streams and multilevel data," Journal of Computational Social Science, Springer, vol. 7(2), pages 1823-1859, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Harold Doran, 2023. "A Collection of Numerical Recipes Useful for Building Scalable Psychometric Applications," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 37-69, February.
    2. Øystein Sørensen & Anders M. Fjell & Kristine B. Walhovd, 2023. "Longitudinal Modeling of Age-Dependent Latent Traits with Generalized Additive Latent and Mixed Models," Psychometrika, Springer;The Psychometric Society, vol. 88(2), pages 456-486, June.
    3. Steffen Nestler & Edgar Erdfelder, 2023. "Random Effects Multinomial Processing Tree Models: A Maximum Likelihood Approach," Psychometrika, Springer;The Psychometric Society, vol. 88(3), pages 809-829, September.
    4. JANSSENS, Jochen & DE CORTE, Annelies & SÖRENSEN, Kenneth, 2016. "Water distribution network design optimisation with respect to reliability," Working Papers 2016007, University of Antwerp, Faculty of Business and Economics.
    5. Raymond Hernandez & Elizabeth A. Pyatak & Cheryl L. P. Vigen & Haomiao Jin & Stefan Schneider & Donna Spruijt-Metz & Shawn C. Roll, 2021. "Understanding Worker Well-Being Relative to High-Workload and Recovery Activities across a Whole Day: Pilot Testing an Ecological Momentary Assessment Technique," IJERPH, MDPI, vol. 18(19), pages 1-17, October.
    6. Elisabeth Beckmann & Lukas Olbrich & Joseph Sakshaug, 2024. "Multivariate assessment of interviewer-related errors in a cross-national economic survey (Lukas Olbrich, Elisabeth Beckmann, Joseph W. Sakshaug)," Working Papers 253, Oesterreichische Nationalbank (Austrian Central Bank).
    7. Jan Brenner, 2007. "Parental Impact on Attitude Formation - A Siblings Study on Worries about Immigration," Ruhr Economic Papers 0022, Rheinisch-Westfälisches Institut für Wirtschaftsforschung, Ruhr-Universität Bochum, Universität Dortmund, Universität Duisburg-Essen.
    8. Valentina Krenz & Arjen Alink & Tobias Sommer & Benno Roozendaal & Lars Schwabe, 2023. "Time-dependent memory transformation in hippocampus and neocortex is semantic in nature," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    9. Morán-Ordóñez, Alejandra & Ameztegui, Aitor & De Cáceres, Miquel & de-Miguel, Sergio & Lefèvre, François & Brotons, Lluís & Coll, Lluís, 2020. "Future trade-offs and synergies among ecosystem services in Mediterranean forests under global change scenarios," Ecosystem Services, Elsevier, vol. 45(C).
    10. Damian M. Herz & Manuel Bange & Gabriel Gonzalez-Escamilla & Miriam Auer & Keyoumars Ashkan & Petra Fischer & Huiling Tan & Rafal Bogacz & Muthuraman Muthuraman & Sergiu Groppa & Peter Brown, 2022. "Dynamic control of decision and movement speed in the human basal ganglia," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    11. Dongyan Liu & Chongran Zhou & John K. Keesing & Oscar Serrano & Axel Werner & Yin Fang & Yingjun Chen & Pere Masque & Janine Kinloch & Aleksey Sadekov & Yan Du, 2022. "Wildfires enhance phytoplankton production in tropical oceans," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    12. Zhaogeng Yang & Yanhui Li & Peijin Hu & Jun Ma & Yi Song, 2020. "Prevalence of Anemia and its Associated Factors among Chinese 9-, 12-, and 14-Year-Old Children: Results from 2014 Chinese National Survey on Students Constitution and Health," IJERPH, MDPI, vol. 17(5), pages 1-10, February.
    13. Marco Lopez-Cruz & Fernando M. Aguate & Jacob D. Washburn & Natalia Leon & Shawn M. Kaeppler & Dayane Cristina Lima & Ruijuan Tan & Addie Thompson & Laurence Willard Bretonne & Gustavo los Campos, 2023. "Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    14. Baumann, Elias & Kern, Jana & Lessmann, Stefan, 2019. "Usage Continuance in Software-as-a-Service," IRTG 1792 Discussion Papers 2019-005, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    15. repec:cup:judgdm:v:16:y:2021:i:1:p:201-237 is not listed on IDEAS
    16. C. Gabriel Hidalgo Pizango & Eurídice N. Honorio Coronado & Jhon del Águila-Pasquel & Gerardo Flores Llampazo & Johan de Jong & César J. Córdova Oroche & José M. Reyna Huaymacari & Steve J. Carver & D, 2022. "Sustainable palm fruit harvesting as a pathway to conserve Amazon peatland forests," Nature Sustainability, Nature, vol. 5(6), pages 479-487, June.
    17. Lin-Lin Wang & Zachary Y. Huang & Wen-Fei Dai & Yong-Ping Yang & Yuan-Wen Duan, 2024. "Mixed effects of honey bees on pollination function in the Tibetan alpine grasslands," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    18. Szefer Elena & Lu Donghuan & Nathoo Farouk & Beg Mirza Faisal & Graham Jinko, 2017. "Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 367-386, December.
    19. Julien Collet & Samantha C Patrick & Henri Weimerskirch, 2017. "A comparative analysis of the behavioral response to fishing boats in two albatross species," Behavioral Ecology, International Society for Behavioral Ecology, vol. 28(5), pages 1337-1347.
    20. Sean Coogan & Zhixian Sui & David Raubenheimer, 2018. "Gluttony and guilt: monthly trends in internet search query data are comparable with national-level energy intake and dieting behavior," Palgrave Communications, Palgrave Macmillan, vol. 4(1), pages 1-9, December.
    21. Darcy Steeg Morris & Kimberly F. Sellers, 2022. "A Flexible Mixed Model for Clustered Count Data," Stats, MDPI, vol. 5(1), pages 1-18, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:137:y:2019:i:c:p:16-32. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.