IDEAS home Printed from https://ideas.repec.org/p/zbw/zewdip/11015.html
   My bibliography  Save this paper

Clustering life trajectories: A new divisive hierarchical clustering algorithm for discrete-valued discrete time series

Author

Listed:
  • Dlugosz, Stephan

Abstract

A new algorithm for clustering life course trajectories is presented and tested with large register data. Life courses are represented as sequences on a monthly timescale for the working-life with an age span from 16-65. A meaningful clustering result for this kind of data provides interesting subgroups with similar life course trajectories. The high sampling rate allows precise discrimination of the different subgroups, but it produces a lot of highly correlated data for phases with low variability. The main challenge is to select the variables (points in time) that carry most of the relevant information. The new algorithm deals with this problem by simultaneously clustering and identifying critical junctures for each of the relevant subgroups. The developed divisive algorithm is able to handle large amounts of data with multiple dimensions within reasonable time. This is demonstrated on data from the Federal German pension insurance.

Suggested Citation

  • Dlugosz, Stephan, 2011. "Clustering life trajectories: A new divisive hierarchical clustering algorithm for discrete-valued discrete time series," ZEW Discussion Papers 11-015, ZEW - Leibniz Centre for European Economic Research.
  • Handle: RePEc:zbw:zewdip:11015
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/44458/1/654047626.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Katarina Košmelj & Vladimir Batagelj, 1990. "Cross-sectional approach for clustering time varying data," Journal of Classification, Springer;The Classification Society, vol. 7(1), pages 99-109, March.
    2. Raffaella Piccarreta & Francesco C. Billari, 2007. "Clustering work and family trajectories by using a divisive algorithm," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(4), pages 1061-1078, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Moehring, Katja & Weiland, Andreas & Reifenscheid, Maximiliane & Naumann, Elias & Wenz, Alexander & Rettig, Tobias & Krieger, Ulrich & Fikel, Marina & Cornesse, Carina & Blom, Annelies G., 2021. "Inequality in employment trajectories and their socio-economic consequences during the early phase of the COVID-19 pandemic in Germany," SocArXiv m95df, Center for Open Science.
    2. Beibei Zhang & Rong Chen, 2018. "Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic," Journal of Classification, Springer;The Classification Society, vol. 35(3), pages 394-421, October.
    3. Michael Anyadike-Danes & Duncan McVicar, 2010. "My Brilliant Career: Characterizing the Early Labor Market Trajectories of British Women From Generation X," Sociological Methods & Research, , vol. 38(3), pages 482-512, February.
    4. Coppi, Renato & D'Urso, Pierpaolo, 2006. "Fuzzy unsupervised classification of multivariate time trajectories with the Shannon entropy regularization," Computational Statistics & Data Analysis, Elsevier, vol. 50(6), pages 1452-1477, March.
    5. Dandan Xu & Yang Bian & Jian Rong & Jiachuan Wang & Baocai Yin, 2019. "Study on Clustering of Free-Floating Bike-Sharing Parking Time Series in Beijing Subway Stations," Sustainability, MDPI, vol. 11(19), pages 1-20, September.
    6. Akira Yoshida & Yoshiharu Amano & Noboru Murata & Koichi Ito & Takumi Hasizume, 2013. "A Comparison of Optimal Operation of a Residential Fuel Cell Co-Generation System Using Clustered Demand Patterns Based on Kullback-Leibler Divergence," Energies, MDPI, vol. 6(1), pages 1-26, January.
    7. Serah Shin & Hyungsoo Kim, 2018. "Health Trajectories of Older Americans and Medical Expenses: Evidence from the Health and Retirement Study Data Over the 18 Year Period," Journal of Family and Economic Issues, Springer, vol. 39(1), pages 19-33, March.
    8. Raffaella Piccarreta, 2012. "Graphical and Smoothing Techniques for Sequence Analysis," Sociological Methods & Research, , vol. 41(2), pages 362-380, May.
    9. N. Barban & X. de Luna & E. Lundholm & I. Svensson & F. C. Billari, 2020. "Causal Effects of the Timing of Life-course Events: Age at Retirement and Subsequent Health," Sociological Methods & Research, , vol. 49(1), pages 216-249, February.
    10. Caiado, Jorge & Crato, Nuno & Pena, Daniel, 2006. "A periodogram-based metric for time series classification," Computational Statistics & Data Analysis, Elsevier, vol. 50(10), pages 2668-2684, June.
    11. Raffaella Piccarreta, 2017. "Joint Sequence Analysis," Sociological Methods & Research, , vol. 46(2), pages 252-287, March.
    12. Liao, Tim F. & Bolano, Danilo & Brzinsky-Fay, Christian & Cornwell, Benjamin & Fasang, Anette Eva & Helske, Satu & Piccarreta, Raffaella & Raab, Marcel & Ritschard, Gilbert & Struffolino, Emanuela & S, 2022. "Sequence analysis: Its past, present, and future," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 107, pages 1-1.
    13. Piccarreta, Raffaella, 2010. "Binary trees for dissimilarity data," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1516-1524, June.
    14. Ignacio Benítez & José-Luis Díez, 2022. "Automated Detection of Electric Energy Consumption Load Profile Patterns," Energies, MDPI, vol. 15(6), pages 1-26, March.
    15. Piccarreta, Raffaella & Bonetti, Marco, 2019. "Assessing and comparing models for sequence data by microsimulation (with Supplementary Material)," SocArXiv 3mcfp, Center for Open Science.
    16. Coppi, Renato & D'Urso, Pierpaolo, 2003. "Three-way fuzzy clustering models for LR fuzzy time trajectories," Computational Statistics & Data Analysis, Elsevier, vol. 43(2), pages 149-177, June.
    17. Raffaella Piccarreta & Orna Lior, 2010. "Exploring sequences: a graphical tool based on multi‐dimensional scaling," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 173(1), pages 165-184, January.
    18. Marco Bonetti & Raffaella Piccarreta & Gaia Salford, 2013. "Parametric and Nonparametric Analysis of Life Courses: An Application to Family Formation Patterns," Demography, Springer;Population Association of America (PAA), vol. 50(3), pages 881-902, June.
    19. Andrade, Stefan B. & Fasang, Anette Eva & Helske, Satu & Karhula, Aleksi, 2023. "Typologies in Sequence Analysis: Practical Guidelines for Identifying Robust Cluster Solutions," SocArXiv kj8d5, Center for Open Science.
    20. Christophe Genolini & Bruno Falissard, 2010. "KmL: k-means for longitudinal data," Computational Statistics, Springer, vol. 25(2), pages 317-328, June.

    More about this item

    Keywords

    Clustering; measures of association; discrete data; time series;
    All these keywords.

    JEL classification:

    • C33 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Models with Panel Data; Spatio-temporal Models
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • J00 - Labor and Demographic Economics - - General - - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:zewdip:11015. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/zemande.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.