IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v37y2022i1d10.1007_s00180-021-01116-0.html
   My bibliography  Save this article

Reduced multidimensional scaling

Author

Listed:
  • Emmanuel Paradis

    (University of Montpellier)

Abstract

Dimension reduction is a common problem when analysing large data sets. The present paper proposes a method called reduced multidimensional scaling based on performing an initial standard multidimensional scaling on a reduced data set. This method faces the problem of finding a representative reduced sample. An algorithm is presented to perform this selection based on alternating sampling in outlier areas and observations in high density areas. A space is then constructed with the selected reduced sample by standard multidimentional scaling using pairwise distances. The observations not included in the reduced sample are then projected on the constructed space using Gower’s formula in order to obtain a final representation of the whole data set. The only requirement is the ability to compute distances among observations. A simulation study showed that the proposed algorithm results performs well to detect outliers. Evaluation of running times suggests that the proposed method could run in a few hours with data sets that would take more than one year to analyse with standard multidimensional scaling. An application is presented with a dataset of 9547 DNA sequences of human immunodeficiency viruses.

Suggested Citation

  • Emmanuel Paradis, 2022. "Reduced multidimensional scaling," Computational Statistics, Springer, vol. 37(1), pages 91-105, March.
  • Handle: RePEc:spr:compst:v:37:y:2022:i:1:d:10.1007_s00180-021-01116-0
    DOI: 10.1007/s00180-021-01116-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-021-01116-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-021-01116-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. J. Kruskal, 1964. "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis," Psychometrika, Springer;The Psychometric Society, vol. 29(1), pages 1-27, March.
    2. Gad Abraham & Michael Inouye, 2014. "Fast Principal Component Analysis of Large-Scale Genome-Wide Data," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-5, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Roger Shepard, 1974. "Representation of structure in similarity data: Problems and prospects," Psychometrika, Springer;The Psychometric Society, vol. 39(4), pages 373-421, December.
    2. Giovanna Boccuzzo & Licia Maron, 2017. "Proposal of a composite indicator of job quality based on a measure of weighted distances," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(5), pages 2357-2374, September.
    3. Jong-Seok Lee & Dan Zhu, 2012. "Shilling Attack Detection---A New Approach for a Trustworthy Recommender System," INFORMS Journal on Computing, INFORMS, vol. 24(1), pages 117-131, February.
    4. Ján Kulfan & Lenka Sarvašová & Michal Parák & Marek Dzurenko & Peter Zach, 2018. "Can late flushing trees avoid attack by moth larvae in temperate forests?," Plant Protection Science, Czech Academy of Agricultural Sciences, vol. 54(4), pages 272-283.
    5. Lili Liu & Atlas Khan & Elena Sanchez-Rodriguez & Francesca Zanoni & Yifu Li & Nicholas Steers & Olivia Balderes & Junying Zhang & Priya Krithivasan & Robert A. LeDesma & Clara Fischman & Scott J. Heb, 2022. "Genetic regulation of serum IgA levels and susceptibility to common immune, infectious, kidney, and cardio-metabolic traits," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    6. Ma, Jie & Tse, Ying Kei & Wang, Xiaojun & Zhang, Minhao, 2019. "Examining customer perception and behaviour through social media research – An empirical study of the United Airlines overbooking crisis," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 127(C), pages 192-205.
    7. Muñoz-Mas, Rafael & Vezza, Paolo & Alcaraz-Hernández, Juan Diego & Martínez-Capel, Francisco, 2016. "Risk of invasion predicted with support vector machines: A case study on northern pike (Esox Lucius, L.) and bleak (Alburnus alburnus, L.)," Ecological Modelling, Elsevier, vol. 342(C), pages 123-134.
    8. Ivan Mihál & Eva Luptáková & Martin Pavlík, 2021. "Wood-inhabiting macromycete communities in spruce stands on former agricultural land," Journal of Forest Science, Czech Academy of Agricultural Sciences, vol. 67(2), pages 51-65.
    9. Venera Tomaselli, 1996. "Multivariate statistical techniques and sociological research," Quality & Quantity: International Journal of Methodology, Springer, vol. 30(3), pages 253-276, August.
    10. Simensen, Trond & Halvorsen, Rune & Erikstad, Lars, 2018. "Methods for landscape characterisation and mapping: A systematic review," Land Use Policy, Elsevier, vol. 75(C), pages 557-569.
    11. Marie Diekmann & Ludwig Theuvsen, 2019. "Value structures determining community supported agriculture: insights from Germany," Agriculture and Human Values, Springer;The Agriculture, Food, & Human Values Society (AFHVS), vol. 36(4), pages 733-746, December.
    12. Bijmolt, T.H.A. & Wedel, M., 1996. "A Monte Carlo Evaluation of Maximum Likelihood Multidimensional Scaling Methods," Other publications TiSEM f72cc9d8-f370-43aa-a224-4, Tilburg University, School of Economics and Management.
    13. Jarmila Horváthová & Martina Mokrišová & Mária Vrábliková, 2021. "Benchmarking—A Way of Finding Risk Factors in Business Performance," JRFM, MDPI, vol. 14(5), pages 1-17, May.
    14. Shiau, Wen-Lung & Dwivedi, Yogesh K. & Yang, Han Suan, 2017. "Co-citation and cluster analyses of extant literature on social networks," International Journal of Information Management, Elsevier, vol. 37(5), pages 390-399.
    15. Roderick McDonald, 1976. "A note on monotone polygons fitted to bivariate data," Psychometrika, Springer;The Psychometric Society, vol. 41(4), pages 543-546, December.
    16. D. V. Pahan Prasada, 2013. "Domestic versus Multilateral Institutions in Bilateral Trade: A Comparative Gravity Analysis," International Economic Journal, Taylor & Francis Journals, vol. 27(1), pages 127-142, March.
    17. Phipps Arabie & J. Carroll, 1980. "Mapclus: A mathematical programming approach to fitting the adclus model," Psychometrika, Springer;The Psychometric Society, vol. 45(2), pages 211-235, June.
    18. Mark Davison, 1976. "Fitting and testing carroll's weighted unfolding model for preferences," Psychometrika, Springer;The Psychometric Society, vol. 41(2), pages 233-247, June.
    19. Malcolm Dow & Peter Willett & Roderick McDonald & Belver Griffith & Michael Greenacre & Peter Bryant & Daniel Wartenberg & Ove Frank, 1987. "Book reviews," Journal of Classification, Springer;The Classification Society, vol. 4(2), pages 245-278, September.
    20. Dionisios Koutsantonis & Konstantinos Koutsantonis & Nikolaos P. Bakas & Vagelis Plevris & Andreas Langousis & Savvas A. Chatzichristofis, 2022. "Bibliometric Literature Review of Adaptive Learning Systems," Sustainability, MDPI, vol. 14(19), pages 1-18, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:37:y:2022:i:1:d:10.1007_s00180-021-01116-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.