IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v120y2013icp173-184.html
   My bibliography  Save this article

Robust estimation of location and scatter by pruning the minimum spanning tree

Author

Listed:
  • Kirschstein, Thomas
  • Liebscher, Steffen
  • Becker, Claudia

Abstract

One of the most essential topics in robust statistics is the robust estimation of location and covariance. Many popular robust (location and scatter) estimators such as Fast-MCD, MVE, and MZE require at least a convex distribution of the underlying data. In the case of non-convex data distributions these approaches may lead to a suboptimal result caused by the application of Mahalanobis distances with respect to location and covariance of a suitably chosen subsample of the data—implying a convex structure. The approach presented here fixes this drawback using Euclidean distances. The data set is treated as a complete network and the minimum spanning tree (MST) for this data set is calculated. Based on the MST a subset of relevant points (thought of as an “outlier-free” subsample of minimum size) is determined which can then be used for calculating data characteristics. It is shown, that the approach has a maximum breakdown point. Additionally, a simulation study provides insights in the approach’s behaviour with respect to increasing dimension and size.

Suggested Citation

  • Kirschstein, Thomas & Liebscher, Steffen & Becker, Claudia, 2013. "Robust estimation of location and scatter by pruning the minimum spanning tree," Journal of Multivariate Analysis, Elsevier, vol. 120(C), pages 173-184.
  • Handle: RePEc:eee:jmvana:v:120:y:2013:i:c:p:173-184
    DOI: 10.1016/j.jmva.2013.05.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X13000900
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2013.05.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Joe, Harry, 2006. "Generating random correlation matrices based on partial correlations," Journal of Multivariate Analysis, Elsevier, vol. 97(10), pages 2177-2189, November.
    2. Marco Riani & Anthony C. Atkinson & Andrea Cerioli, 2009. "Finding an unknown number of multivariate outliers," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 447-466, April.
    3. Caroni, C. & Prescott, P., 1995. "On Rohlf's Method for the Detection of Outliers in Multivariate Data," Journal of Multivariate Analysis, Elsevier, vol. 52(2), pages 295-307, February.
    4. Becker, Claudia & Gather, Ursula, 2001. "The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules," Computational Statistics & Data Analysis, Elsevier, vol. 36(1), pages 119-127, March.
    5. M. J. R. Healy, 1968. "Multivariate Normal Plotting," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 17(2), pages 157-161, June.
    6. J. C. Gower & G. J. S. Ross, 1969. "Minimum Spanning Trees and Single Linkage Cluster Analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 18(1), pages 54-64, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mathias Kloss & Thomas Kirschstein & Steffen Liebscher & Martin Petrick, 2019. "Robust Productivity Analysis: An application to German FADN data," Papers 1902.00678, arXiv.org, revised Feb 2019.
    2. Steffen Liebscher & Thomas Kirschstein, 2015. "Efficiency of the pMST and RDELA location and scatter estimators," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 99(1), pages 63-82, January.
    3. Kirschstein, Thomas & Liebscher, Steffen & Pandolfo, Giuseppe & Porzio, Giovanni C. & Ragozini, Giancarlo, 2019. "On finite-sample robustness of directional location estimators," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 53-75.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Modarres, Reza, 2014. "On the interpoint distances of Bernoulli vectors," Statistics & Probability Letters, Elsevier, vol. 84(C), pages 215-222.
    2. Meltem Ekiz & O.Ufuk Ekiz, 2017. "Outlier detection with Mahalanobis square distance: incorporating small sample correction factor," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2444-2457, October.
    3. Brian Hartley, 2020. "Corridor stability of the Kaleckian growth model: a Markov-switching approach," Working Papers 2013, New School for Social Research, Department of Economics, revised Nov 2020.
    4. Jürgen Wellmann & Ursula Gather, 2003. "Identification of outliers in a one-way random effects model," Statistical Papers, Springer, vol. 44(3), pages 335-348, July.
    5. Anthony C. Atkinson & Marco Riani & Andrea Cerioli, 2018. "Cluster detection and clustering with random start forward searches," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(5), pages 777-798, April.
    6. Steffen Liebscher & Thomas Kirschstein, 2015. "Efficiency of the pMST and RDELA location and scatter estimators," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 99(1), pages 63-82, January.
    7. Søren Johansen & Bent Nielsen, 2014. "Optimal hedging with the cointegrated vector autoregressive model," Discussion Papers 14-23, University of Copenhagen. Department of Economics.
    8. Anthony C. Atkinson & Aldo Corbellini & Marco Riani, 2017. "Robust Bayesian regression with the forward search: theory and data analysis," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(4), pages 869-886, December.
    9. Torti, Francesca & Corbellini, Aldo & Atkinson, Anthony C., 2021. "fsdaSAS: a package for robust regression for very large datasets including the batch forward search," LSE Research Online Documents on Economics 109895, London School of Economics and Political Science, LSE Library.
    10. Beibei Zhang & Rong Chen, 2018. "Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic," Journal of Classification, Springer;The Classification Society, vol. 35(3), pages 394-421, October.
    11. Flórez, Alvaro J. & Molenberghs, Geert & Van der Elst, Wim & Alonso Abad, Ariel, 2022. "An efficient algorithm to assess multivariate surrogate endpoints in a causal inference framework," Computational Statistics & Data Analysis, Elsevier, vol. 172(C).
    12. Azamir, Bouchaib & Bennis, Driss & Michel, Bertrand, 2022. "A simplified algorithm for identifying abnormal changes in dynamic networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 607(C).
    13. Pokojovy, Michael & Jobe, J. Marcus, 2022. "A robust deterministic affine-equivariant algorithm for multivariate location and scatter," Computational Statistics & Data Analysis, Elsevier, vol. 172(C).
    14. Sung-Soo Kim & W. Krzanowski, 2007. "Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization," Computational Statistics, Springer, vol. 22(1), pages 109-119, April.
    15. Anthony C. Atkinson & Marco Riani & Aldo Corbellini, 2020. "The analysis of transformations for profit‐and‐loss data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(2), pages 251-275, April.
    16. Arismendi, Juan C. & Broda, Simon, 2017. "Multivariate elliptical truncated moments," Journal of Multivariate Analysis, Elsevier, vol. 157(C), pages 29-44.
    17. Salvatore Ingrassia & Simona Minotti & Giorgio Vittadini, 2012. "Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 363-401, October.
    18. Ahuja, Ravindra K., 1956-, 1992. "Applications of network optimization," Working papers 3458-92., Massachusetts Institute of Technology (MIT), Sloan School of Management.
    19. Trucíos, Carlos & Hotta, Luiz K. & Valls Pereira, Pedro L., 2019. "On the robustness of the principal volatility components," Journal of Empirical Finance, Elsevier, vol. 52(C), pages 201-219.
    20. Ilya Archakov & Peter Reinhard Hansen & Yiyao Luo, 2024. "A new method for generating random correlation matrices," The Econometrics Journal, Royal Economic Society, vol. 27(2), pages 188-212.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:120:y:2013:i:c:p:173-184. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.