IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v33y2016i2d10.1007_s00357-016-9207-5.html
   My bibliography  Save this article

Improved Classification for Compositional Data Using the α-transformation

Author

Listed:
  • Michail Tsagris

    (University of Crete)

  • Simon Preston

    (University of Nottingham)

  • Andrew T. A. Wood

    (University of Nottingham)

Abstract

In compositional data analysis, an observation is a vector containing nonnegative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centers on the idea of using the α-transformation to transform the data and then to classify the transformed data via regularized discriminant analysis and the k-nearest neighbors algorithm. Using the α-transformation generalizes two rival approaches in compositional data analysis, one (when α=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α = 0) that employs Aitchison’s centered log-ratio transformation. A numerical study with several real datasets shows that whether using α = 1 or α = 0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α can sometimes give better performance than using either 1 or 0.

Suggested Citation

  • Michail Tsagris & Simon Preston & Andrew T. A. Wood, 2016. "Improved Classification for Compositional Data Using the α-transformation," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 243-261, July.
  • Handle: RePEc:spr:jclass:v:33:y:2016:i:2:d:10.1007_s00357-016-9207-5
    DOI: 10.1007/s00357-016-9207-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-016-9207-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-016-9207-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ferdinand Österreicher & Igor Vajda, 2003. "A new class of metric divergences on probability spaces and its applicability in statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 55(3), pages 639-653, September.
    2. T. Tsagris, Michail & Preston, Simon & T.A. Wood, Andrew, 2011. "A data-based power transformation for compositional data," MPRA Paper 53068, University Library of Munich, Germany.
    3. Juan Manuel Larrosa, 2003. "A Compositional Statistical Analysis of Capital per Worker," Macroeconomics 0301006, University Library of Munich, Germany.
    4. J. L. Scealy & A. H. Welsh, 2011. "Regression for compositional data by using distributions defined on the hypersphere," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(3), pages 351-375, June.
    5. Greenacre, Michael, 2009. "Power transformations in correspondence analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3107-3116, June.
    6. Gueorguieva, Ralitza & Rosenheck, Robert & Zelterman, Daniel, 2008. "Dirichlet component regression and its applications to psychiatric data," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5344-5355, August.
    7. Javier Palarea-Albaladejo & Josep Martín-Fernández & Jesús Soto, 2012. "Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data," Journal of Classification, Springer;The Classification Society, vol. 29(2), pages 144-169, July.
    8. Paulo Rodrigues & Ana Lima, 2009. "Analysis of an European union election using principal component analysis," Statistical Papers, Springer, vol. 50(4), pages 895-904, August.
    9. Michael Greenacre, 2008. "Measuring subcompositional incoherence," Economics Working Papers 1106, Department of Economics and Business, Universitat Pompeu Fabra, revised Jan 2011.
    10. Adam Butler & Chris Glasbey, 2008. "A latent Gaussian model for compositional data with zeros," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 57(5), pages 505-520, December.
    11. Jane Fry & Tim Fry & Keith McLaren, 2000. "Compositional data analysis and zeros in micro data," Applied Economics, Taylor & Francis Journals, vol. 32(8), pages 953-959.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Wang, Dieter & Andrée, Bo Pieter Johannes & Chamorro, Andres Fernando & Spencer, Phoebe Girouard, 2022. "Transitions into and out of food insecurity: A probabilistic approach with panel data evidence from 15 countries," World Development, Elsevier, vol. 159(C).
    2. Yannis Pantazis & Michail Tsagris & Andrew T. A. Wood, 2019. "Gaussian Asymptotic Limits for the α-transformation in the Analysis of Compositional Data," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 81(1), pages 63-82, February.
    3. Wang,Dieter & Andree,Bo Pieter Johannes & Chamorro Elizondo,Andres Fernando & Spencer,Phoebe Girouard, 2020. "Stochastic Modeling of Food Insecurity," Policy Research Working Paper Series 9413, The World Bank.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tsagris, Michail & Preston, Simon & T.A. Wood, Andrew, 2016. "Improved classi cation for compositional data using the $\alpha$-transformation," MPRA Paper 67657, University Library of Munich, Germany.
    2. Tsagris, Michail, 2014. "The k-NN algorithm for compositional data: a revised approach with and without zero values present," MPRA Paper 65866, University Library of Munich, Germany.
    3. Tsagris, Michail & Preston, Simon & T.A. Wood, Andrew, 2016. "Nonparametric hypothesis testing for equality of means on the simplex," MPRA Paper 72771, University Library of Munich, Germany.
    4. Tsagris, Michail, 2015. "A novel, divergence based, regression for compositional data," MPRA Paper 72769, University Library of Munich, Germany.
    5. Juan José Egozcue & Vera Pawlowsky-Glahn, 2019. "Compositional data: the sample space and its structure," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(3), pages 599-638, September.
    6. Tsagris, Michail, 2015. "Regression analysis with compositional data containing zero values," MPRA Paper 67868, University Library of Munich, Germany.
    7. Michael Greenacre, 2023. "The chi-square standardization, combined with Box-Cox transformation, is a valid alternative to transforming to logratios in compositional data analysis," Economics Working Papers 1857, Department of Economics and Business, Universitat Pompeu Fabra.
    8. Morais, Joanna & Simioni, Michel & Thomas-Agnan, Christine, 2016. "A tour of regression models for explaining shares," TSE Working Papers 16-742, Toulouse School of Economics (TSE).
    9. Michael Greenacre, 2024. "The chiPower transformation: a valid alternative to logratio transformations in compositional data analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(3), pages 769-796, September.
    10. M. Templ & K. Hron & P. Filzmoser, 2017. "Exploratory tools for outlier detection in compositional data with structural zeros," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(4), pages 734-752, March.
    11. Michail Tsagris, 2018. "Modelling Structural Zeros in Compositional Data," Working Papers 1803, University of Crete, Department of Economics.
    12. Nermina Mumic & Peter Filzmoser, 2021. "A multivariate test for detecting fraud based on Benford’s law, with application to music streaming data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(3), pages 819-840, September.
    13. Jiajia Chen & Xiaoqin Zhang & Shengjia Li, 2017. "Multiple linear regression with compositional response and covariates," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(12), pages 2270-2285, September.
    14. J. L. Scealy & Patrice de Caritat & Eric C. Grunsky & Michail T. Tsagris & A. H. Welsh, 2015. "Robust Principal Component Analysis for Power Transformed Compositional Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 136-148, March.
    15. Blasius, J. & Greenacre, M. & Groenen, P.J.F. & van de Velden, M., 2009. "Special issue on correspondence analysis and related methods," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3103-3106, June.
    16. Napoleón Vargas Jurado & Kent M. Eskridge & Stephen D. Kachman & Ronald M. Lewis, 2018. "Using a Bayesian Hierarchical Linear Mixing Model to Estimate Botanical Mixtures," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(2), pages 190-207, June.
    17. Melo, Tatiane F.N. & Vasconcellos, Klaus L.P. & Lemonte, Artur J., 2009. "Some restriction tests in a new class of regression models for proportions," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 3972-3979, October.
    18. Chai, Andreas & Stepanova, Elena & Moneta, Alessio, 2023. "Quantifying expenditure hierarchies and the expansion of global consumption diversity," Journal of Economic Behavior & Organization, Elsevier, vol. 214(C), pages 860-886.
    19. Jack Gregory & David I. Stern, 2012. "Fuel Choices in Rural Maharashtra," CCEP Working Papers 1207, Centre for Climate & Energy Policy, Crawford School of Public Policy, The Australian National University.
    20. Ida Camminatiello & Antonello D’Ambra & Luigi D’Ambra, 2022. "The association in two-way ordinal contingency tables through global odds ratios," METRON, Springer;Sapienza Università di Roma, vol. 80(1), pages 9-22, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:33:y:2016:i:2:d:10.1007_s00357-016-9207-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.