IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v11y2017i2d10.1007_s11634-016-0245-y.html
   My bibliography  Save this article

Exploratory data analysis for interval compositional data

Author

Listed:
  • Karel Hron

    (Palacký University)

  • Paula Brito

    (Universidade do Porto)

  • Peter Filzmoser

    (Vienna University of Technology)

Abstract

Compositional data are considered as data where relative contributions of parts on a whole, conveyed by (log-)ratios between them, are essential for the analysis. In Symbolic Data Analysis (SDA), we are in the framework of interval data when elements are characterized by variables whose values are intervals on $$\mathbb {R}$$ R representing inherent variability. In this paper, we address the special problem of the analysis of interval compositions, i.e., when the interval data are obtained by the aggregation of compositions. It is assumed that the interval information is represented by the respective midpoints and ranges, and both sources of information are considered as compositions. In this context, we introduce the representation of interval data as three-way data. In the framework of the log-ratio approach from compositional data analysis, it is outlined how interval compositions can be treated in an exploratory context. The goal of the analysis is to represent the compositions by coordinates which are interpretable in terms of the original compositional parts. This is achieved by summarizing all relative information (logratios) about each part into one coordinate from the coordinate system. Based on an example from the European Union Statistics on Income and Living Conditions (EU-SILC), several possibilities for an exploratory data analysis approach for interval compositions are outlined and investigated.

Suggested Citation

  • Karel Hron & Paula Brito & Peter Filzmoser, 2017. "Exploratory data analysis for interval compositional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(2), pages 223-241, June.
  • Handle: RePEc:spr:advdac:v:11:y:2017:i:2:d:10.1007_s11634-016-0245-y
    DOI: 10.1007/s11634-016-0245-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-016-0245-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-016-0245-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Pieter Kroonenberg & Jan Leeuw, 1980. "Principal component analysis of three-mode data by means of alternating least squares algorithms," Psychometrika, Springer;The Psychometric Society, vol. 45(1), pages 69-97, March.
    2. Billard L. & Diday E., 2003. "From the Statistics of Data to the Statistics of Knowledge: Symbolic Data Analysis," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 470-487, January.
    3. Giordani, Paolo & Kiers, Henk A.L., 2006. "A comparison of three methods for principal component analysis of fuzzy interval data," Computational Statistics & Data Analysis, Elsevier, vol. 51(1), pages 379-397, November.
    4. Paula Brito & A. Pedro Duarte Silva, 2012. "Modelling interval data with Normal and Skew-Normal distributions," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(1), pages 3-20, March.
    5. Lima Neto, Eufrasio de A. & de Carvalho, Francisco de A.T., 2008. "Centre and Range method for fitting a linear regression model to symbolic interval data," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1500-1515, January.
    6. Paola Zuccolotto, 2007. "Principal components of sample estimates: an approach through symbolic data analysis," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 16(2), pages 173-192, August.
    7. Javier Palarea-Albaladejo & Josep Martín-Fernández & Jesús Soto, 2012. "Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data," Journal of Classification, Springer;The Classification Society, vol. 29(2), pages 144-169, July.
    8. Billheimer D. & Guttorp P. & Fagan W.F., 2001. "Statistical Interpretation of Species Composition," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1205-1214, December.
    9. Alfons, Andreas & Templ, Matthias, 2013. "Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i15).
    10. John Aitchison & Michael Greenacre, 2002. "Biplots of compositional data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 51(4), pages 375-392, October.
    11. repec:kap:stmapp:v:16:y:2007:i:2:p:173-192 is not listed on IDEAS
    12. Kojadinovic, Ivan & Holmes, Mark, 2009. "Tests of independence among continuous random vectors based on Cramr-von Mises functionals of the empirical copula process," Journal of Multivariate Analysis, Elsevier, vol. 100(6), pages 1137-1154, July.
    13. Coppi, Renato & Gil, Maria A. & Kiers, Henk A.L., 2006. "The fuzzy approach to statistical analysis," Computational Statistics & Data Analysis, Elsevier, vol. 51(1), pages 1-14, November.
    14. Lima Neto, Eufrásio de A. & de Carvalho, Francisco de A.T., 2010. "Constrained linear regression models for symbolic interval-valued variables," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 333-347, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. A. Pedro Duarte Silva & Peter Filzmoser & Paula Brito, 2018. "Outlier detection in interval data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 785-822, September.
    2. Sun, Yuying & Zhang, Xinyu & Wan, Alan T.K. & Wang, Shouyang, 2022. "Model averaging for interval-valued data," European Journal of Operational Research, Elsevier, vol. 301(2), pages 772-784.
    3. Eufr�sio de A. Lima Neto & Ulisses U. dos Anjos, 2015. "Regression model for interval-valued variables based on copulas," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(9), pages 2010-2029, September.
    4. Sun, Yuying & Han, Ai & Hong, Yongmiao & Wang, Shouyang, 2018. "Threshold autoregressive models for interval-valued time series data," Journal of Econometrics, Elsevier, vol. 206(2), pages 414-446.
    5. Paolo Giordani, 2015. "Lasso-constrained regression analysis for interval-valued data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(1), pages 5-19, March.
    6. Hao, Peng & Guo, Junpeng, 2017. "Constrained center and range joint model for interval-valued symbolic data regression," Computational Statistics & Data Analysis, Elsevier, vol. 116(C), pages 106-138.
    7. Blanco-Fernández, Angela & Corral, Norberto & González-Rodríguez, Gil, 2011. "Estimation of a flexible simple linear model for interval data based on set arithmetic," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2568-2578, September.
    8. Maia, André Luis Santiago & de Carvalho, Francisco de A.T., 2011. "Holt's exponential smoothing and neural network models for forecasting interval-valued time series," International Journal of Forecasting, Elsevier, vol. 27(3), pages 740-759, July.
    9. Yan Sun & Guanghua Lian & Zudi Lu & Jennifer Loveland & Isaac Blackhurst, 2020. "Modeling the Variance of Return Intervals Toward Volatility Prediction," Journal of Time Series Analysis, Wiley Blackwell, vol. 41(4), pages 492-519, July.
    10. Maia, André Luis Santiago & de Carvalho, Francisco de A.T., 2011. "Holt’s exponential smoothing and neural network models for forecasting interval-valued time series," International Journal of Forecasting, Elsevier, vol. 27(3), pages 740-759.
    11. A. Silva & Paula Brito, 2015. "Discriminant Analysis of Interval Data: An Assessment of Parametric and Distance-Based Approaches," Journal of Classification, Springer;The Classification Society, vol. 32(3), pages 516-541, October.
    12. Lin, Wei & González-Rivera, Gloria, 2016. "Interval-valued time series models: Estimation based on order statistics exploring the Agriculture Marketing Service data," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 694-711.
    13. Cheolwoo Park & Yongho Jeon & Kee-Hoon Kang, 2016. "An exploratory data analysis in scale-space for interval-valued data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(14), pages 2643-2660, October.
    14. Antonio Calcagnì & Luigi Lombardi & Lorenzo Avanzi & Eduardo Pascali, 2020. "Multiple mediation analysis for interval-valued data," Statistical Papers, Springer, vol. 61(1), pages 347-369, February.
    15. Carlo Drago & Roberto Ricciuti, 2019. "An interval variables approach to address measurement uncertainty in governance indicators," Economics Bulletin, AccessEcon, vol. 39(1), pages 626-635.
    16. Dias, Sónia & Brito, Paula, 2017. "Off the beaten track: A new linear model for interval data," European Journal of Operational Research, Elsevier, vol. 258(3), pages 1118-1130.
    17. Chang, Meng-Shiuh & Ju, Peijie & Liu, Yilei & Hsueh, Shao-Chieh, 2022. "Determining hedges and safe havens for stocks using interval analysis," The North American Journal of Economics and Finance, Elsevier, vol. 61(C).
    18. Wei Yang & Ai Han & Yongmiao Hong & Shouyang Wang, 2016. "Analysis of crisis impact on crude oil prices: a new approach with interval time series modelling," Quantitative Finance, Taylor & Francis Journals, vol. 16(12), pages 1917-1928, December.
    19. Liang-Ching Lin & Hsiang-Lin Chien & Sangyeol Lee, 2021. "Symbolic interval-valued data analysis for time series based on auto-interval-regressive models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 295-315, March.
    20. Gloria Gonzalez-Rivera & Javier Arroyo & Carlos Mate, 2011. "Forecasting with Interval and Histogram Data. Some Financial Applications," Working Papers 201438, University of California at Riverside, Department of Economics.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:11:y:2017:i:2:d:10.1007_s11634-016-0245-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.