IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v18y2024i3d10.1007_s11634-024-00600-x.html
   My bibliography  Save this article

The chiPower transformation: a valid alternative to logratio transformations in compositional data analysis

Author

Listed:
  • Michael Greenacre

    (Universitat Pompeu Fabra)

Abstract

The approach to analysing compositional data has been dominated by the use of logratio transformations, to ensure exact subcompositional coherence and, in some situations, exact isometry as well. A problem with this approach is that data zeros, found in most applications, have to be replaced to allow the logarithmic transformation. An alternative new approach, called the ‘chiPower’ transformation, which allows data zeros, is to combine the standardization inherent in the chi-square distance in correspondence analysis, with the essential elements of the Box-Cox power transformation. The chiPower transformation is justified because it defines between-sample distances that tend to logratio distances for strictly positive data as the power parameter tends to zero, and are then equivalent to transforming to logratios. For data with zeros, a value of the power can be identified that brings the chiPower transformation as close as possible to a logratio transformation, without having to substitute the zeros. Especially in the area of high-dimensional data, this alternative approach can present such a high level of coherence and isometry as to be a valid approach to the analysis of compositional data. Furthermore, in a supervised learning context, if the compositional variables serve as predictors of a response in a modelling framework, for example generalized linear models, then the power can be used as a tuning parameter in optimizing the accuracy of prediction through cross-validation. The chiPower-transformed variables have a straightforward interpretation, since they are identified with single compositional parts, not ratios.

Suggested Citation

  • Michael Greenacre, 2024. "The chiPower transformation: a valid alternative to logratio transformations in compositional data analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(3), pages 769-796, September.
  • Handle: RePEc:spr:advdac:v:18:y:2024:i:3:d:10.1007_s11634-024-00600-x
    DOI: 10.1007/s11634-024-00600-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-024-00600-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-024-00600-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Greenacre, Michael, 2009. "Power transformations in correspondence analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3107-3116, June.
    2. John Aitchison & Michael Greenacre, 2002. "Biplots of compositional data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 51(4), pages 375-392, October.
    3. Michael Greenacre, 2008. "Measuring subcompositional incoherence," Economics Working Papers 1106, Department of Economics and Business, Universitat Pompeu Fabra, revised Jan 2011.
    4. W. J. Krzanowski, 1987. "Selection of Variables to Preserve Multivariate Data Structure, Using Principal Components," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 36(1), pages 22-33, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Greenacre, 2023. "The chi-square standardization, combined with Box-Cox transformation, is a valid alternative to transforming to logratios in compositional data analysis," Economics Working Papers 1857, Department of Economics and Business, Universitat Pompeu Fabra.
    2. Michael Greenacre & Paul Lewi, 2009. "Distributional Equivalence and Subcompositional Coherence in the Analysis of Compositional Data, Contingency Tables and Ratio-Scale Measurements," Journal of Classification, Springer;The Classification Society, vol. 26(1), pages 29-54, April.
    3. Lombardo, Rosaria & Camminatiello, Ida & D'Ambra, Antonello & Beh, Eric J., 2021. "Assessing the Italian tax courts system by weighted three-way log-ratio analysis," Socio-Economic Planning Sciences, Elsevier, vol. 73(C).
    4. Tsagris, Michail & Preston, Simon & T.A. Wood, Andrew, 2016. "Improved classi cation for compositional data using the $\alpha$-transformation," MPRA Paper 67657, University Library of Munich, Germany.
    5. Michail Tsagris & Simon Preston & Andrew T. A. Wood, 2016. "Improved Classification for Compositional Data Using the α-transformation," Journal of Classification, Springer;The Classification Society, vol. 33(2), pages 243-261, July.
    6. Juan José Egozcue & Vera Pawlowsky-Glahn, 2019. "Compositional data: the sample space and its structure," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(3), pages 599-638, September.
    7. B. Baris Alkan & Afsin Sahin, 2011. "Measuring inequalities in the distribution of health workers by bi-plot approach: The case of Turkey," Journal of Economics and Behavioral Studies, AMH International, vol. 2(2), pages 57-66.
    8. Javier Palarea-Albaladejo & Josep Martín-Fernández & Jesús Soto, 2012. "Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data," Journal of Classification, Springer;The Classification Society, vol. 29(2), pages 144-169, July.
    9. Michael Greenacre & Paul Lewi, 2005. "Distributional equivalence and subcompositional coherence in the analysis of contingency tables, ratio-scale measurements and compositional data," Economics Working Papers 908, Department of Economics and Business, Universitat Pompeu Fabra, revised Aug 2007.
    10. Anna Maria Fiori & Francesco Porro, 2023. "A compositional analysis of systemic risk in European financial institutions," Annals of Finance, Springer, vol. 19(3), pages 325-354, September.
    11. Blasius, J. & Greenacre, M. & Groenen, P.J.F. & van de Velden, M., 2009. "Special issue on correspondence analysis and related methods," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3103-3106, June.
    12. Germ`a Coenders & N'uria Arimany Serrat, 2023. "Accounting statement analysis at industry level. A gentle introduction to the compositional approach," Papers 2305.16842, arXiv.org, revised Sep 2024.
    13. Pacheco, Joaquín & Casado, Silvia & Porras, Santiago, 2013. "Exact methods for variable selection in principal component analysis: Guide functions and pre-selection," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 95-111.
    14. Ida Camminatiello & Antonello D’Ambra & Luigi D’Ambra, 2022. "The association in two-way ordinal contingency tables through global odds ratios," METRON, Springer;Sapienza Università di Roma, vol. 80(1), pages 9-22, April.
    15. Greenacre, Michael, 2009. "Power transformations in correspondence analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3107-3116, June.
    16. Jérome SARACCO & Marie CHAVENT & Vanessa KUENTZ, 2010. "Clustering of categorical variables around latent variables," Cahiers du GREThA (2007-2019) 2010-02, Groupe de Recherche en Economie Théorique et Appliquée (GREThA).
    17. Michael Greenacre, 2009. "Contribution biplots," Economics Working Papers 1162, Department of Economics and Business, Universitat Pompeu Fabra, revised Jan 2011.
    18. Huiwen Wang & Liying Shangguan & Rong Guan & Lynne Billard, 2015. "Principal component analysis for compositional data vectors," Computational Statistics, Springer, vol. 30(4), pages 1079-1096, December.
    19. repec:jss:jstsof:13:i05 is not listed on IDEAS
    20. Jan Skála & Radim Vácha & Pavel Čupr, 2018. "Which Compounds Contribute Most to Elevated Soil Pollution and the Corresponding Health Risks in Floodplains in the Headwater Areas of the Central European Watershed?," IJERPH, MDPI, vol. 15(6), pages 1-16, June.
    21. Eric J. Beh & Rosaria Lombardo, 2024. "Correspondence Analysis Using the Cressie–Read Family of Divergence Statistics," International Statistical Review, International Statistical Institute, vol. 92(1), pages 17-42, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:18:y:2024:i:3:d:10.1007_s11634-024-00600-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.