IDEAS home Printed from https://ideas.repec.org/a/taf/japsta/v32y2005i9p969-987.html
   My bibliography  Save this article

Interpretable dimension reduction

Author

Listed:
  • Hugh Chipman
  • Hong Gu

Abstract

The analysis of high-dimensional data often begins with the identification of lower dimensional subspaces. Principal component analysis is a dimension reduction technique that identifies linear combinations of variables along which most variation occurs or which best “reconstruct” the original variables. For example, many temperature readings may be taken in a production process when in fact there are just a few underlying variables driving the process. A problem with principal components is that the linear combinations can seem quite arbitrary. To make them more interpretable, we introduce two classes of constraints. In the first, coefficients are constrained to equal a small number of values (homogeneity constraint). The second constraint attempts to set as many coefficients to zero as possible (sparsity constraint). The resultant interpretable directions are either calculated to be close to the original principal component directions, or calculated in a stepwise manner that may make the components more orthogonal. A small dataset on characteristics of cars is used to introduce the techniques. A more substantial data mining application is also given, illustrating the ability of the procedure to scale to a very large number of variables.

Suggested Citation

  • Hugh Chipman & Hong Gu, 2005. "Interpretable dimension reduction," Journal of Applied Statistics, Taylor & Francis Journals, vol. 32(9), pages 969-987.
  • Handle: RePEc:taf:japsta:v:32:y:2005:i:9:p:969-987
    DOI: 10.1080/02664760500168648
    as

    Download full text from publisher

    File URL: http://www.tandfonline.com/doi/abs/10.1080/02664760500168648
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/02664760500168648?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. S. K. Vines, 2000. "Simple principal components," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 49(4), pages 441-451.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Juan José Egozcue & Vera Pawlowsky-Glahn, 2019. "Compositional data: the sample space and its structure," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(3), pages 599-638, September.
    2. Trendafilov, Nickolay T. & Vines, Karen, 2009. "Simple and interpretable discrimination," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 979-989, February.
    3. Nickolay Trendafilov, 2014. "From simple structure to sparse components: a review," Computational Statistics, Springer, vol. 29(3), pages 431-454, June.
    4. Edoardo Saccenti & Johan A Westerhuis & Age K Smilde & Mariët J van der Werf & Jos A Hageman & Margriet M W B Hendriks, 2011. "Simplivariate Models: Uncovering the Underlying Biology in Functional Genomics Data," PLOS ONE, Public Library of Science, vol. 6(6), pages 1-13, June.
    5. Mr. Emre Alper & Michal Miktus, 2019. "Digital Connectivity in sub-Saharan Africa: A Comparative Perspective," IMF Working Papers 2019/210, International Monetary Fund.
    6. T. F. Cox & D. S. Arnold, 2018. "Simple components," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(1), pages 83-99, January.
    7. E. Raffinetti & I. Romeo, 2015. "Dealing with the biased effects issue when handling huge datasets: the case of INVALSI data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(12), pages 2554-2570, December.
    8. Lansangan, Joseph Ryan G. & Barrios, Erniel B., 2017. "Simultaneous dimension reduction and variable selection in modeling high dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 112(C), pages 242-256.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kim, Hyun Hak & Swanson, Norman R., 2018. "Mining big data using parsimonious factor, machine learning, variable selection and shrinkage methods," International Journal of Forecasting, Elsevier, vol. 34(2), pages 339-354.
    2. Norman R. Swanson, 2016. "Comment," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(3), pages 348-353, July.
    3. E. Raffinetti & I. Romeo, 2015. "Dealing with the biased effects issue when handling huge datasets: the case of INVALSI data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(12), pages 2554-2570, December.
    4. Trendafilov, Nickolay T. & Vines, Karen, 2009. "Simple and interpretable discrimination," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 979-989, February.
    5. Choulakian, V. & Allard, J. & Almhana, J., 2006. "Robust centroid method," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 737-746, November.
    6. T. F. Cox & D. S. Arnold, 2018. "Simple components," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(1), pages 83-99, January.
    7. José Fernando Romero Cañizares & Purificación Vicente Galindo & Yannis Phillis & Evangelos Grigoroudis, 2022. "Graphical sustainability analysis using disjoint biplots," Operational Research, Springer, vol. 22(2), pages 1575-1596, April.
    8. Carlo Cavicchia & Maurizio Vichi & Giorgia Zaccaria, 2023. "Hierarchical disjoint principal component analysis," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(3), pages 537-574, September.
    9. Antonello D’Ambra & Pietro Amenta, 2023. "An extension of correspondence analysis based on the multiple Taguchi’s index to evaluate the relationships between three categorical variables graphically: an application to the Italian football cham," Annals of Operations Research, Springer, vol. 325(1), pages 219-244, June.
    10. Luca Scrucca, 2006. "Subset selection in dimension reduction methods," Quaderni del Dipartimento di Economia, Finanza e Statistica 23/2006, Università di Perugia, Dipartimento Economia.
    11. Sabatier, Robert & Reynès, Christelle, 2008. "Extensions of simple component analysis and simple linear discriminant analysis using genetic algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4779-4789, June.
    12. Shen, Haipeng & Huang, Jianhua Z., 2008. "Sparse principal component analysis via regularized low rank matrix approximation," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1015-1034, July.
    13. Jolliffe, Ian, 2022. "A 50-year personal journey through time with principal component analysis," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    14. Hyun Hak Kim & Norman Swanson, 2013. "Mining Big Data Using Parsimonious Factor and Shrinkage Methods," Departmental Working Papers 201316, Rutgers University, Department of Economics.
    15. Nickolay Trendafilov, 2014. "From simple structure to sparse components: a review," Computational Statistics, Springer, vol. 29(3), pages 431-454, June.
    16. Adelaide Freitas & Eloísa Macedo & Maurizio Vichi, 2021. "An empirical comparison of two approaches for CDPCA in high-dimensional data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(3), pages 1007-1031, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:32:y:2005:i:9:p:969-987. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.