IDEAS home Printed from https://ideas.repec.org/p/msh/ebswps/2020-17.html
   My bibliography  Save this paper

Hole or grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions

Author

Listed:
  • Ursula Laa
  • Dianne Cook
  • Andreas Buja
  • German Valencia

Abstract

Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the extensive work in projection pursuit, to search for interesting slices of the data. Linear projections are used to define sections of the parameter space, and to calculate interestingness by comparing the distribution of observations, inside and outside a section. By optimizing this index, it is possible to reveal features such as holes (low density) or grains (high density). The optimization is incorporated into a guided tour so that the search for structure can be dynamic. The approach can be useful for problems when data distributions depart from uniform or normal, as in visually exploring nonlinear manifolds, and functions in multivariate space. Two applications of section pursuit are shown: exploring decision boundaries from classification models, and exploring subspaces induced by complex inequality conditions from multiple parameter model. The new methods are available in R, in the tourr package.

Suggested Citation

  • Ursula Laa & Dianne Cook & Andreas Buja & German Valencia, 2020. "Hole or grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions," Monash Econometrics and Business Statistics Working Papers 17/20, Monash University, Department of Econometrics and Business Statistics.
  • Handle: RePEc:msh:ebswps:2020-17
    as

    Download full text from publisher

    File URL: https://www.monash.edu/business/ebs/research/publications/ebs/wp17-2020.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Wickham, Hadley & Cook, Dianne & Hofmann, Heike & Buja, Andreas, 2011. "tourr: An R Package for Exploring Multivariate Data with Projections," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i02).
    2. Ursula Laa & Dianne Cook, 2020. "Using tours to visually investigate properties of new projection pursuit indexes with application to problems in physics," Computational Statistics, Springer, vol. 35(3), pages 1171-1205, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fischer, Daniel & Berro, Alain & Nordhausen, Klaus & Ruiz-Gazen, Anne, 2019. "REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit," TSE Working Papers 19-1001, Toulouse School of Economics (TSE).
    2. Huang, Bei & Cook, Dianne & Wickham, Hadley, 2012. "tourrGui: A gWidgets GUI for the Tour to Explore High-Dimensional Data Using Low-Dimensional Projections," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 49(i06).
    3. Ursula Laa & Dianne Cook & Stuart Lee, 2020. "Burning Sage: Reversing the Curse of Dimensionality in the Visualization of High-Dimensional Data," Monash Econometrics and Business Statistics Working Papers 36/20, Monash University, Department of Econometrics and Business Statistics.
    4. Valero-Mora, Pedro M. & Ledesma, Ruben, 2012. "Graphical User Interfaces for R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 49(i01).
    5. Niladri Roy Chowdhury & Dianne Cook & Heike Hofmann & Mahbubul Majumder & Eun-Kyung Lee & Amy Toth, 2015. "Using visual statistical inference to better understand random class separations in high dimension, low sample size data," Computational Statistics, Springer, vol. 30(2), pages 293-316, June.
    6. Hlávka, Zdeněk & Hlubinka, Daniel & Koňasová, Kateřina, 2022. "Functional ANOVA based on empirical characteristic functionals," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    7. Nicola Loperfido, 2023. "Kurtosis removal for data pre-processing," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 239-267, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:msh:ebswps:2020-17. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Professor Xibin Zhang (email available below). General contact details of provider: https://edirc.repec.org/data/dxmonau.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.