IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v53y2009i10p3706-3716.html
   My bibliography  Save this article

Robust probabilistic PCA with missing data and contribution analysis for outlier detection

Author

Listed:
  • Chen, Tao
  • Martin, Elaine
  • Montague, Gary

Abstract

Principal component analysis (PCA) is a widely adopted multivariate data analysis technique, with interpretation being established on the basis of both classical linear projection and a probability model (i.e. probabilistic PCA (PPCA)). Recently robust PPCA models, by using the multivariate t-distribution, have been proposed to consider the situation where there may be outliers within the data set. This paper presents an overview of the robust PPCA technique, and further discusses the issue of missing data. An expectation-maximization (EM) algorithm is presented for the maximum likelihood estimation of the model parameters in the presence of missing data. When applying robust PPCA for outlier detection, a contribution analysis method is proposed to identify which variables contribute the most to the occurrence of outliers, providing valuable information regarding the source of outlying data. The proposed technique is demonstrated on numerical examples, and the application to outlier detection and diagnosis in an industrial fermentation process.

Suggested Citation

  • Chen, Tao & Martin, Elaine & Montague, Gary, 2009. "Robust probabilistic PCA with missing data and contribution analysis for outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3706-3716, August.
  • Handle: RePEc:eee:csdana:v:53:y:2009:i:10:p:3706-3716
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(09)00124-8
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    2. Tao Chen & Julian Morris & Elaine Martin, 2006. "Probability density estimation via an infinite Gaussian mixture model: application to statistical process monitoring," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 55(5), pages 699-715, November.
    3. Li, Baibing & Martin, Elaine B. & Morris, A. Julian, 2002. "On principal component analysis in L1," Computational Statistics & Data Analysis, Elsevier, vol. 40(3), pages 471-474, September.
    4. Kotz,Samuel & Nadarajah,Saralees, 2004. "Multivariate T-Distributions and Their Applications," Cambridge Books, Cambridge University Press, number 9780521826549, September.
    5. Liu, Chuanhai, 1997. "ML Estimation of the MultivariatetDistribution and the EM Algorithm," Journal of Multivariate Analysis, Elsevier, vol. 63(2), pages 296-312, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Debruyne, Michiel & Hubert, Mia & Van Horebeek, Johan, 2010. "Detecting influential observations in Kernel PCA," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3007-3019, December.
    2. Sang-Mok Lee & So-Won Choi & Eul-Bum Lee, 2023. "Prediction Modeling of Flue Gas Control for Combustion Efficiency Optimization for Steel Mill Power Plant Boilers Based on Partial Least Squares Regression (PLSR)," Energies, MDPI, vol. 16(19), pages 1-33, September.
    3. Boente, Graciela & Pires, Ana M. & Rodrigues, Isabel M., 2010. "Detecting influential observations in principal components and common principal components," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2967-2975, December.
    4. Efstathios Panayi & Gareth Peters & Ioannis Kosmidis, 2014. "Liquidity commonality does not imply liquidity resilience commonality: A functional characterisation for ultra-high frequency cross-sectional LOB data," Papers 1406.5486, arXiv.org.
    5. Dorota Toczydlowska & Gareth W. Peters, 2018. "Financial Big Data Solutions for State Space Panel Regression in Interest Rate Dynamics," Econometrics, MDPI, vol. 6(3), pages 1-45, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matteo Barigozzi, 2023. "Asymptotic equivalence of Principal Components and Quasi Maximum Likelihood estimators in Large Approximate Factor Models," Papers 2307.09864, arXiv.org, revised Jun 2024.
    2. Jung, WoongHee & Taflanidis, Alexandros A. & Kyprioti, Aikaterini P. & Zhang, Jize, 2024. "Adaptive multi-fidelity Monte Carlo for real-time probabilistic storm surge predictions," Reliability Engineering and System Safety, Elsevier, vol. 247(C).
    3. Francesco Curreri & Giacomo Fiumara & Maria Gabriella Xibilia, 2020. "Input Selection Methods for Soft Sensor Design: A Survey," Future Internet, MDPI, vol. 12(6), pages 1-24, June.
    4. Gaofeng Jia & Alexandros Taflanidis & Norberto Nadal-Caraballo & Jeffrey Melby & Andrew Kennedy & Jane Smith, 2016. "Surrogate modeling for peak or time-dependent storm surge prediction over an extended coastal region using an existing database of synthetic storms," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 81(2), pages 909-938, March.
    5. Dorota Toczydlowska & Gareth W. Peters & Man Chung Fung & Pavel V. Shevchenko, 2017. "Stochastic Period and Cohort Effect State-Space Mortality Models Incorporating Demographic Factors via Probabilistic Robust Principal Components," Risks, MDPI, vol. 5(3), pages 1-77, July.
    6. Marconi, Gabriele, 2014. "European higher education policies and the problem of estimating a complex model with a small cross-section," MPRA Paper 87600, University Library of Munich, Germany.
    7. Benaych-Georges, Florent & Nadakuditi, Raj Rao, 2012. "The singular values and vectors of low rank perturbations of large rectangular random matrices," Journal of Multivariate Analysis, Elsevier, vol. 111(C), pages 120-135.
    8. Gaofeng Jia & Alexandros A. Taflanidis & Norberto C. Nadal-Caraballo & Jeffrey A. Melby & Andrew B. Kennedy & Jane M. Smith, 2016. "Surrogate modeling for peak or time-dependent storm surge prediction over an extended coastal region using an existing database of synthetic storms," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 81(2), pages 909-938, March.
    9. Landgraf, Andrew J. & Lee, Yoonkyung, 2020. "Dimensionality reduction for binary data through the projection of natural parameters," Journal of Multivariate Analysis, Elsevier, vol. 180(C).
    10. Paola Zuccolotto, 2012. "Principal component analysis with interval imputed missing values," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 96(1), pages 1-23, January.
    11. Elizondo Rocío, 2013. "Forecasting the Term Structure of Interest Rates in Mexico Using an Affine Model," Working Papers 2013-03, Banco de México.
    12. Plat, Richard, 2009. "Stochastic portfolio specific mortality and the quantification of mortality basis risk," Insurance: Mathematics and Economics, Elsevier, vol. 45(1), pages 123-132, August.
    13. Kondylis, Athanassios & Whittaker, Joe, 2008. "Spectral preconditioning of Krylov spaces: Combining PLS and PC regression," Computational Statistics & Data Analysis, Elsevier, vol. 52(5), pages 2588-2603, January.
    14. Ouyang, Yaofu & Li, Peng, 2018. "On the nexus of financial development, economic growth, and energy consumption in China: New perspective from a GMM panel VAR approach," Energy Economics, Elsevier, vol. 71(C), pages 238-252.
    15. Paschalis Arvanitidis & Athina Economou & Christos Kollias, 2016. "Terrorism’s effects on social capital in European countries," Public Choice, Springer, vol. 169(3), pages 231-250, December.
    16. Rizvi, Syed Kumail Abbas & Rahat, Birjees & Naqvi, Bushra & Umar, Muhammad, 2024. "Revolutionizing finance: The synergy of fintech, digital adoption, and innovation," Technological Forecasting and Social Change, Elsevier, vol. 200(C).
    17. Teerachai Amnuaylojaroen & Pavinee Chanvichit, 2024. "Historical Analysis of the Effects of Drought on Rice and Maize Yields in Southeast Asia," Resources, MDPI, vol. 13(3), pages 1-18, March.
    18. Xin Xu & Yang Lu & Yupeng Zhou & Zhiguo Fu & Yanjie Fu & Minghao Yin, 2021. "An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks," Mathematics, MDPI, vol. 9(15), pages 1-14, July.
    19. Weili Duan & Bin He & Daniel Nover & Guishan Yang & Wen Chen & Huifang Meng & Shan Zou & Chuanming Liu, 2016. "Water Quality Assessment and Pollution Source Identification of the Eastern Poyang Lake Basin Using Multivariate Statistical Methods," Sustainability, MDPI, vol. 8(2), pages 1-15, January.
    20. Adele Ravagnani & Fabrizio Lillo & Paola Deriu & Piero Mazzarisi & Francesca Medda & Antonio Russo, 2024. "Dimensionality reduction techniques to support insider trading detection," Papers 2403.00707, arXiv.org, revised May 2024.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:53:y:2009:i:10:p:3706-3716. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.