IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v120y2013icp173-184.html
   My bibliography  Save this article

Robust estimation of location and scatter by pruning the minimum spanning tree

Author

Listed:
  • Kirschstein, Thomas
  • Liebscher, Steffen
  • Becker, Claudia

Abstract

One of the most essential topics in robust statistics is the robust estimation of location and covariance. Many popular robust (location and scatter) estimators such as Fast-MCD, MVE, and MZE require at least a convex distribution of the underlying data. In the case of non-convex data distributions these approaches may lead to a suboptimal result caused by the application of Mahalanobis distances with respect to location and covariance of a suitably chosen subsample of the data—implying a convex structure. The approach presented here fixes this drawback using Euclidean distances. The data set is treated as a complete network and the minimum spanning tree (MST) for this data set is calculated. Based on the MST a subset of relevant points (thought of as an “outlier-free” subsample of minimum size) is determined which can then be used for calculating data characteristics. It is shown, that the approach has a maximum breakdown point. Additionally, a simulation study provides insights in the approach’s behaviour with respect to increasing dimension and size.

Suggested Citation

  • Kirschstein, Thomas & Liebscher, Steffen & Becker, Claudia, 2013. "Robust estimation of location and scatter by pruning the minimum spanning tree," Journal of Multivariate Analysis, Elsevier, vol. 120(C), pages 173-184.
  • Handle: RePEc:eee:jmvana:v:120:y:2013:i:c:p:173-184
    DOI: 10.1016/j.jmva.2013.05.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X13000900
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2013.05.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Joe, Harry, 2006. "Generating random correlation matrices based on partial correlations," Journal of Multivariate Analysis, Elsevier, vol. 97(10), pages 2177-2189, November.
    2. Marco Riani & Anthony C. Atkinson & Andrea Cerioli, 2009. "Finding an unknown number of multivariate outliers," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 447-466, April.
    3. Caroni, C. & Prescott, P., 1995. "On Rohlf's Method for the Detection of Outliers in Multivariate Data," Journal of Multivariate Analysis, Elsevier, vol. 52(2), pages 295-307, February.
    4. Becker, Claudia & Gather, Ursula, 2001. "The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules," Computational Statistics & Data Analysis, Elsevier, vol. 36(1), pages 119-127, March.
    5. M. J. R. Healy, 1968. "Multivariate Normal Plotting," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 17(2), pages 157-161, June.
    6. J. C. Gower & G. J. S. Ross, 1969. "Minimum Spanning Trees and Single Linkage Cluster Analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 18(1), pages 54-64, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kirschstein, Thomas & Liebscher, Steffen & Pandolfo, Giuseppe & Porzio, Giovanni C. & Ragozini, Giancarlo, 2019. "On finite-sample robustness of directional location estimators," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 53-75.
    2. Mathias Kloss & Thomas Kirschstein & Steffen Liebscher & Martin Petrick, 2019. "Robust Productivity Analysis: An application to German FADN data," Papers 1902.00678, arXiv.org, revised Feb 2019.
    3. Steffen Liebscher & Thomas Kirschstein, 2015. "Efficiency of the pMST and RDELA location and scatter estimators," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 99(1), pages 63-82, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Modarres, Reza, 2014. "On the interpoint distances of Bernoulli vectors," Statistics & Probability Letters, Elsevier, vol. 84(C), pages 215-222.
    2. Meltem Ekiz & O.Ufuk Ekiz, 2017. "Outlier detection with Mahalanobis square distance: incorporating small sample correction factor," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2444-2457, October.
    3. Brian Hartley, 2020. "Corridor stability of the Kaleckian growth model: a Markov-switching approach," Working Papers 2013, New School for Social Research, Department of Economics, revised Nov 2020.
    4. Jürgen Wellmann & Ursula Gather, 2003. "Identification of outliers in a one-way random effects model," Statistical Papers, Springer, vol. 44(3), pages 335-348, July.
    5. Steffen Liebscher & Thomas Kirschstein, 2015. "Efficiency of the pMST and RDELA location and scatter estimators," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 99(1), pages 63-82, January.
    6. Beibei Zhang & Rong Chen, 2018. "Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic," Journal of Classification, Springer;The Classification Society, vol. 35(3), pages 394-421, October.
    7. Flórez, Alvaro J. & Molenberghs, Geert & Van der Elst, Wim & Alonso Abad, Ariel, 2022. "An efficient algorithm to assess multivariate surrogate endpoints in a causal inference framework," Computational Statistics & Data Analysis, Elsevier, vol. 172(C).
    8. Azamir, Bouchaib & Bennis, Driss & Michel, Bertrand, 2022. "A simplified algorithm for identifying abnormal changes in dynamic networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 607(C).
    9. Sung-Soo Kim & W. Krzanowski, 2007. "Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization," Computational Statistics, Springer, vol. 22(1), pages 109-119, April.
    10. Anthony C. Atkinson & Marco Riani & Aldo Corbellini, 2020. "The analysis of transformations for profit‐and‐loss data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(2), pages 251-275, April.
    11. Salvatore Ingrassia & Simona Minotti & Giorgio Vittadini, 2012. "Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 363-401, October.
    12. Trucíos, Carlos & Hotta, Luiz K. & Valls Pereira, Pedro L., 2019. "On the robustness of the principal volatility components," Journal of Empirical Finance, Elsevier, vol. 52(C), pages 201-219.
    13. Ilya Archakov & Peter Reinhard Hansen & Yiyao Luo, 2024. "A new method for generating random correlation matrices," The Econometrics Journal, Royal Economic Society, vol. 27(2), pages 188-212.
    14. Domenico Perrotta & Marco Riani & Francesca Torti, 2009. "New robust dynamic plots for regression mixture detection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 3(3), pages 263-279, December.
    15. Menjoge, Rajiv S. & Welsch, Roy E., 2010. "A diagnostic method for simultaneous feature selection and outlier identification in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3181-3193, December.
    16. Hirofumi Michimae & Takeshi Emura, 2022. "Bayesian ridge estimators based on copula-based joint prior distributions for regression coefficients," Computational Statistics, Springer, vol. 37(5), pages 2741-2769, November.
    17. Madar, Vered, 2015. "Direct formulation to Cholesky decomposition of a general nonsingular correlation matrix," Statistics & Probability Letters, Elsevier, vol. 103(C), pages 142-147.
    18. Liang, Jia-Juan & Bentler, Peter M., 1999. "A t-distribution plot to detect non-multinormality," Computational Statistics & Data Analysis, Elsevier, vol. 30(1), pages 31-44, March.
    19. Luca Greco & Giovanni Saraceno & Claudio Agostinelli, 2021. "Robust Fitting of a Wrapped Normal Model to Multivariate Circular Data and Outlier Detection," Stats, MDPI, vol. 4(2), pages 1-18, June.
    20. Hui Yao & Sungduk Kim & Ming-Hui Chen & Joseph G. Ibrahim & Arvind K. Shah & Jianxin Lin, 2015. "Bayesian Inference for Multivariate Meta-Regression With a Partially Observed Within-Study Sample Covariance Matrix," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 528-544, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:120:y:2013:i:c:p:173-184. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.