IDEAS home Printed from https://ideas.repec.org/p/msh/ebswps/2020-36.html
   My bibliography  Save this paper

Burning Sage: Reversing the Curse of Dimensionality in the Visualization of High-Dimensional Data

Author

Listed:
  • Ursula Laa
  • Dianne Cook
  • Stuart Lee

Abstract

In high-dimensional data analysis the curse of dimensionality reasons that points tend to be far away from the center of the distribution and on the edge of highdimensional space. Contrary to this, is that projected data tends to clump at the center. This gives a sense that any structure near the center of the projection is obscured, whether this is true or not. A transformation to reverse the curse, is defined in this paper, which uses radial transformations on the projected data. It is integrated seamlessly into the grand tour algorithm, and we have called it a burning sage tour, to indicate that it reverses the curse. The work is implemented into the tourr package in R. Several case studies are included that show how the sage visualizations enhance exploratory clustering and classification problems.

Suggested Citation

  • Ursula Laa & Dianne Cook & Stuart Lee, 2020. "Burning Sage: Reversing the Curse of Dimensionality in the Visualization of High-Dimensional Data," Monash Econometrics and Business Statistics Working Papers 36/20, Monash University, Department of Econometrics and Business Statistics.
  • Handle: RePEc:msh:ebswps:2020-36
    as

    Download full text from publisher

    File URL: https://www.monash.edu/business/ebs/research/publications/ebs/wp36-2020.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Peter Hall & J. S. Marron & Amnon Neeman, 2005. "Geometric representation of high dimension, low sample size data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(3), pages 427-444, June.
    2. Wickham, Hadley & Cook, Dianne & Hofmann, Heike & Buja, Andreas, 2011. "tourr: An R Package for Exploring Multivariate Data with Projections," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i02).
    3. Jeongyoun Ahn & J. S. Marron, 2010. "The maximal data piling direction for discrimination," Biometrika, Biometrika Trust, vol. 97(1), pages 254-259.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yugo Nakayama & Kazuyoshi Yata & Makoto Aoshima, 2020. "Bias-corrected support vector machine with Gaussian kernel in high-dimension, low-sample-size settings," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(5), pages 1257-1286, October.
    2. Chung, Hee Cheol & Ahn, Jeongyoun, 2021. "Subspace rotations for high-dimensional outlier detection," Journal of Multivariate Analysis, Elsevier, vol. 183(C).
    3. Jung, Sungkyu, 2018. "Continuum directions for supervised dimension reduction," Computational Statistics & Data Analysis, Elsevier, vol. 125(C), pages 27-43.
    4. Bolivar-Cime, A. & Marron, J.S., 2013. "Comparison of binary discrimination methods for high dimension low sample size data," Journal of Multivariate Analysis, Elsevier, vol. 115(C), pages 108-121.
    5. Makoto Aoshima & Kazuyoshi Yata, 2019. "Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(3), pages 473-503, June.
    6. Niladri Roy Chowdhury & Dianne Cook & Heike Hofmann & Mahbubul Majumder & Eun-Kyung Lee & Amy Toth, 2015. "Using visual statistical inference to better understand random class separations in high dimension, low sample size data," Computational Statistics, Springer, vol. 30(2), pages 293-316, June.
    7. Lee, Myung Hee, 2012. "On the border of extreme and mild spiked models in the HDLSS framework," Journal of Multivariate Analysis, Elsevier, vol. 107(C), pages 162-168.
    8. Yata, Kazuyoshi & Aoshima, Makoto, 2013. "PCA consistency for the power spiked model in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 334-354.
    9. Jung, Sungkyu & Sen, Arusharka & Marron, J.S., 2012. "Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 190-203.
    10. Wang, Shao-Hsuan & Huang, Su-Yun, 2022. "Perturbation theory for cross data matrix-based PCA," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    11. Saha, Enakshi & Sarkar, Soham & Ghosh, Anil K., 2017. "Some high-dimensional one-sample tests based on functions of interpoint distances," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 83-95.
    12. Fischer, Daniel & Berro, Alain & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2019. "REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit," TSE Working Papers 19-1001, Toulouse School of Economics (TSE).
    13. Mao, Guangyu, 2018. "Testing independence in high dimensions using Kendall’s tau," Computational Statistics & Data Analysis, Elsevier, vol. 117(C), pages 128-137.
    14. Shin-ichi Tsukada, 2019. "High dimensional two-sample test based on the inter-point distance," Computational Statistics, Springer, vol. 34(2), pages 599-615, June.
    15. Ursula Laa & Dianne Cook & Andreas Buja & German Valencia, 2020. "Hole or grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions," Monash Econometrics and Business Statistics Working Papers 17/20, Monash University, Department of Econometrics and Business Statistics.
    16. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    17. Fionn Murtagh, 2009. "The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 26(3), pages 249-277, December.
    18. Mao, Guangyu, 2015. "A note on testing complete independence for high dimensional data," Statistics & Probability Letters, Elsevier, vol. 106(C), pages 82-85.
    19. Gen Li & Sungkyu Jung, 2017. "Incorporating covariates into integrated factor analysis of multi‐view data," Biometrics, The International Biometric Society, vol. 73(4), pages 1433-1442, December.
    20. repec:jss:jstsof:47:i05 is not listed on IDEAS
    21. Mondal, Pronoy K. & Biswas, Munmun & Ghosh, Anil K., 2015. "On high dimensional two-sample tests based on nearest neighbors," Journal of Multivariate Analysis, Elsevier, vol. 141(C), pages 168-178.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:msh:ebswps:2020-36. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Professor Xibin Zhang (email available below). General contact details of provider: https://edirc.repec.org/data/dxmonau.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.