IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v70y2021i3p714-732.html
   My bibliography  Save this article

Clustering and automatic labelling within time series of categorical observations—with an application to marine log messages

Author

Listed:
  • Emanuele Gramuglia
  • Geir Storvik
  • Morten Stakkeland

Abstract

System logs or log files containing textual messages with associated time stamps are generated by many technologies and systems. The clustering technique proposed in this paper provides a tool to discover and identify patterns or macrolevel events in this data. The motivating application is logs generated by frequency converters in the propulsion system on a ship, while the general setting is fault identification and classification in complex industrial systems. The paper introduces an offline approach for dividing a time series of log messages into a series of discrete segments of random lengths. These segments are clustered into a limited set of states. A state is assumed to correspond to a specific operation or condition of the system, and can be a fault mode or a normal operation. Each of the states can be associated with a specific, limited set of messages, where messages appear in a random or semi‐structured order within the segments. These structures are in general not defined a priori. We propose a Bayesian hierarchical model where the states are characterised both by the temporal frequency and the type of messages within each segment. An algorithm for inference based on reversible jump MCMC is proposed. The performance of the method is assessed by both simulations and operational data.

Suggested Citation

  • Emanuele Gramuglia & Geir Storvik & Morten Stakkeland, 2021. "Clustering and automatic labelling within time series of categorical observations—with an application to marine log messages," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(3), pages 714-732, June.
  • Handle: RePEc:bla:jorssc:v:70:y:2021:i:3:p:714-732
    DOI: 10.1111/rssc.12483
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12483
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12483?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Vanessa Didelez, 2008. "Graphical models for marked point processes based on local independence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 245-264, February.
    2. Matthew Stephens, 2000. "Dealing with label switching in mixture models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 62(4), pages 795-809.
    3. repec:dau:papers:123456789/6069 is not listed on IDEAS
    4. Papastamoulis, Panagiotis, 2016. "label.switching: An R Package for Dealing with the Label Switching Problem in MCMC Outputs," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 69(c01).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Papastamoulis, Panagiotis, 2018. "Overfitting Bayesian mixtures of factor analyzers with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 220-234.
    2. You, Na & Dai, Hongsheng & Wang, Xueqin & Yu, Qingyun, 2024. "Sequential estimation for mixture of regression models for heterogeneous population," Computational Statistics & Data Analysis, Elsevier, vol. 194(C).
    3. Kensuke Okada & Shin-ichi Mayekawa, 2018. "Post-processing of Markov chain Monte Carlo output in Bayesian latent variable models with application to multidimensional scaling," Computational Statistics, Springer, vol. 33(3), pages 1457-1473, September.
    4. Kazuhiro Yamaguchi & Jonathan Templin, 2022. "A Gibbs Sampling Algorithm with Monotonicity Constraints for Diagnostic Classification Models," Journal of Classification, Springer;The Classification Society, vol. 39(1), pages 24-54, March.
    5. Riccardo Rastelli & Michael Fop, 2020. "A stochastic block model for interaction lengths," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 485-512, June.
    6. Wan-Lun Wang, 2019. "Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(1), pages 196-222, March.
    7. Mark S. Handcock & Adrian E. Raftery & Jeremy M. Tantrum, 2007. "Model‐based clustering for social networks," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 170(2), pages 301-354, March.
    8. Arman Oganisian & Nandita Mitra & Jason A. Roy, 2021. "A Bayesian nonparametric model for zero‐inflated outcomes: Prediction, clustering, and causal estimation," Biometrics, The International Biometric Society, vol. 77(1), pages 125-135, March.
    9. Shotwell Matthew S & Slate Elizabeth H, 2010. "Bayesian Modeling of Footrace Finishing Times," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 6(3), pages 1-21, July.
    10. James D. Hamilton & Daniel F. Waggoner & Tao Zha, 2007. "Normalization in Econometrics," Econometric Reviews, Taylor & Francis Journals, vol. 26(2-4), pages 221-252.
    11. Panagiotis Papastamoulis & George Iliopoulos, 2013. "On the Convergence Rate of Random Permutation Sampler and ECR Algorithm in Missing Data Models," Methodology and Computing in Applied Probability, Springer, vol. 15(2), pages 293-304, June.
    12. Yao, Weixin & Wei, Yan & Yu, Chun, 2014. "Robust mixture regression using the t-distribution," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 116-127.
    13. Rufo, M.J. & Pérez, C.J. & Martín, J., 2009. "Local parametric sensitivity for mixture models of lifetime distributions," Reliability Engineering and System Safety, Elsevier, vol. 94(7), pages 1238-1244.
    14. Jeong Eun Lee & Christian Robert, 2013. "Imortance Sampling Schemes for Evidence Approximation in Mixture Models," Working Papers 2013-42, Center for Research in Economics and Statistics.
    15. Grn, Bettina & Leisch, Friedrich, 2009. "Dealing with label switching in mixture models under genuine multimodality," Journal of Multivariate Analysis, Elsevier, vol. 100(5), pages 851-861, May.
    16. Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Kiel Working Papers 1799, Kiel Institute for the World Economy (IfW Kiel).
    17. Aßmann, Christian & Boysen-Hogrefe, Jens, 2011. "A Bayesian approach to model-based clustering for binary panel probit models," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 261-279, January.
    18. Diana Mindrila, 2023. "Bayesian Latent Class Analysis: Sample Size, Model Size, and Classification Precision," Mathematics, MDPI, vol. 11(12), pages 1-18, June.
    19. Sphiwe B. Skhosana & Salomon M. Millard & Frans H. J. Kanfer, 2023. "A Novel EM-Type Algorithm to Estimate Semi-Parametric Mixtures of Partially Linear Models," Mathematics, MDPI, vol. 11(5), pages 1-20, February.
    20. Sun-Joo Cho & Allan S. Cohen, 2010. "A Multilevel Mixture IRT Model With an Application to DIF," Journal of Educational and Behavioral Statistics, , vol. 35(3), pages 336-370, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:70:y:2021:i:3:p:714-732. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.