IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i8p891-d537881.html
   My bibliography  Save this article

Visualizing Profiles of Large Datasets of Weighted and Mixed Data

Author

Listed:
  • Aurea Grané

    (Statistics Department, Universidad Carlos III de Madrid, 28903 Getafe, Spain)

  • Alpha A. Sow-Barry

    (Statistics Department, Universidad Carlos III de Madrid, 28903 Getafe, Spain)

Abstract

This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k -prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k -prototypes algorithm and mapped via classical MDS. Gower’s interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower’s distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.

Suggested Citation

  • Aurea Grané & Alpha A. Sow-Barry, 2021. "Visualizing Profiles of Large Datasets of Weighted and Mixed Data," Mathematics, MDPI, vol. 9(8), pages 1-20, April.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:8:p:891-:d:537881
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/8/891/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/8/891/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Vinué, Guillermo & Epifanio, Irene & Alemany, Sandra, 2015. "Archetypoids: A new approach to define representative archetypal data," Computational Statistics & Data Analysis, Elsevier, vol. 87(C), pages 102-115.
    2. Grané, Aurea & Salini, Silvia & Verdolini, Elena, 2021. "Robust multivariate analysis for mixed-type data: Novel algorithm and its practical application in socio-economic research," Socio-Economic Planning Sciences, Elsevier, vol. 73(C).
    3. Ziqi Jia & Ling Song, 2020. "Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient," Mathematical Problems in Engineering, Hindawi, vol. 2020, pages 1-13, July.
    4. Alonso, Pablo J., 2011. "Profile identification via weighted related metric scaling : an application to dependent Spanish children," DES - Working Papers. Statistics and Econometrics. WS ws113628, Universidad Carlos III de Madrid. Departamento de Estadística.
    5. Boj, Eva & Delicado, Pedro & Fortiana, Josep, 2010. "Distance-based local linear regression for functional predictors," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 429-437, February.
    6. Irene Albarrán & Pablo Alonso & Aurea Grané, 2015. "Profile identification via weighted related metric scaling: an application to dependent Spanish children," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 178(3), pages 593-618, June.
    7. Louis Guttman, 1968. "A general nonmetric technique for finding the smallest coordinate space for a configuration of points," Psychometrika, Springer;The Psychometric Society, vol. 33(4), pages 469-506, December.
    8. Dray, Stéphane & Dufour, Anne-Béatrice, 2007. "The ade4 Package: Implementing the Duality Diagram for Ecologists," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 22(i04).
    9. de Leeuw, Jan & Mair, Patrick, 2009. "Multidimensional Scaling Using Majorization: SMACOF in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 31(i03).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Aurea Grané & Giancarlo Manzi & Silvia Salini, 2021. "Smart Visualization of Mixed Data," Stats, MDPI, vol. 4(2), pages 1-14, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:jss:jstsof:34:i10 is not listed on IDEAS
    2. Alonso González, Pablo J., 2017. "Estimating life expectancy free of dependency : group characterization through the proximity to the deepest dependency path," DES - Working Papers. Statistics and Econometrics. WS 24672, Universidad Carlos III de Madrid. Departamento de Estadística.
    3. Gruenhage, Gina & Opper, Manfred & Barthelme, Simon, 2016. "Visualizing the effects of a changing distance on data using continuous embeddings," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 51-65.
    4. Wenzel Kröber & Martin Böhnke & Erik Welk & Christian Wirth & Helge Bruelheide, 2012. "Leaf Trait-Environment Relationships in a Subtropical Broadleaved Forest in South-East China," PLOS ONE, Public Library of Science, vol. 7(4), pages 1-11, April.
    5. Milton Bloombaum, 1970. "Doing smallest space analysis," Journal of Conflict Resolution, Peace Science Society (International), vol. 14(3), pages 409-416, September.
    6. Samuel Shye, 2010. "The Motivation to Volunteer: A Systemic Quality of Life Theory," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 98(2), pages 183-200, September.
    7. Pengfei Song & Wen Qin & YanGan Huang & Lei Wang & Zhenyuan Cai & Tongzuo Zhang, 2020. "Grazing Management Influences Gut Microbial Diversity of Livestock in the Same Area," Sustainability, MDPI, vol. 12(10), pages 1-12, May.
    8. Patrick Groenen & Rudolf Mathar & Willem Heiser, 1995. "The majorization approach to multidimensional scaling for Minkowski distances," Journal of Classification, Springer;The Classification Society, vol. 12(1), pages 3-19, March.
    9. la Grange, Anthony & le Roux, Niël & Gardner-Lubbe, Sugnet, 2009. "BiplotGUI: Interactive Biplots in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 30(i12).
    10. Jonas Eberle & Renier Myburgh & Dirk Ahrens, 2014. "The Evolution of Morphospace in Phytophagous Scarab Chafers: No Competition - No Divergence?," PLOS ONE, Public Library of Science, vol. 9(5), pages 1-16, May.
    11. Liesbeth François & Katrien Wijnrocx & Frédéric G Colinet & Nicolas Gengler & Bettine Hulsegge & Jack J Windig & Nadine Buys & Steven Janssens, 2017. "Genomics of a revived breed: Case study of the Belgian campine cattle," PLOS ONE, Public Library of Science, vol. 12(4), pages 1-14, April.
    12. Venera Tomaselli, 1996. "Multivariate statistical techniques and sociological research," Quality & Quantity: International Journal of Methodology, Springer, vol. 30(3), pages 253-276, August.
    13. Gupta, Vipin & Hanges, Paul J. & Dorfman, Peter, 2002. "Cultural clusters: methodology and findings," Journal of World Business, Elsevier, vol. 37(1), pages 11-15, April.
    14. Zvi Maimon, 1978. "The choice of ordinal measures of association," Quality & Quantity: International Journal of Methodology, Springer, vol. 12(3), pages 255-264, September.
    15. Funk, Patrick & Davis, Alex & Vaishnav, Parth & Dewitt, Barry & Fuchs, Erica, 2020. "Individual inconsistency and aggregate rationality: Overcoming inconsistencies in expert judgment at the technical frontier," Technological Forecasting and Social Change, Elsevier, vol. 155(C).
    16. Michael J. Greenacre & Patrick J. F. Groenen, 2016. "Weighted Euclidean Biplots," Journal of Classification, Springer;The Classification Society, vol. 33(3), pages 442-459, October.
    17. Kennon M. Sheldon & Evgeny N. Osin & Tamara O. Gordeeva & Dmitry D. Suchkov & Vlaidslav V. Bobrov & Elena I. Rasskazova & Oleg A. Sychev, 2015. "Evaluating the Dimensionality of the Relative Autonomy Continuum in Us and Russian Samples," HSE Working papers WP BRP 48/PSY/2015, National Research University Higher School of Economics.
    18. Roderick McDonald, 1976. "A note on monotone polygons fitted to bivariate data," Psychometrika, Springer;The Psychometric Society, vol. 41(4), pages 543-546, December.
    19. Yoshio Takane & Forrest Young & Jan Leeuw, 1977. "Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features," Psychometrika, Springer;The Psychometric Society, vol. 42(1), pages 7-67, March.
    20. Krzysztof Bartczak & Stanisław Łobejko, 2022. "The Implementation Environment for a Digital Technology Platform of Renewable Energy Sources," Energies, MDPI, vol. 15(16), pages 1-16, August.
    21. Vivian Klaff, 1973. "Ethnic segregation in urban Israel," Demography, Springer;Population Association of America (PAA), vol. 10(2), pages 161-184, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:8:p:891-:d:537881. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.