IDEAS home Printed from https://ideas.repec.org/a/taf/amstat/v72y2018i4p309-314.html
   My bibliography  Save this article

Optimal Whitening and Decorrelation

Author

Listed:
  • Agnan Kessy
  • Alex Lewin
  • Korbinian Strimmer

Abstract

Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example, based on principal component analysis (PCA), Cholesky matrix decomposition, and zero-phase component analysis (ZCA), among others. Here, we provide an overview of the underlying theory and discuss five natural whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the cross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations. As a result we recommend two particular approaches: ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and PCA-cor whitening to obtain sphered variables that maximally compress the original variables.

Suggested Citation

  • Agnan Kessy & Alex Lewin & Korbinian Strimmer, 2018. "Optimal Whitening and Decorrelation," The American Statistician, Taylor & Francis Journals, vol. 72(4), pages 309-314, October.
  • Handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:309-314
    DOI: 10.1080/00031305.2016.1277159
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/00031305.2016.1277159
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/00031305.2016.1277159?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dirk Roeder & Georgi Dimitroff, 2020. "Volatility model calibration with neural networks a comparison between direct and indirect methods," Papers 2007.03494, arXiv.org.
    2. Loperfido, Nicola, 2024. "The skewness of mean–variance normal mixtures," Journal of Multivariate Analysis, Elsevier, vol. 199(C).
    3. Jonathan Gillard & Emily O’Riordan & Anatoly Zhigljavsky, 2023. "Polynomial whitening for high-dimensional data," Computational Statistics, Springer, vol. 38(3), pages 1427-1461, September.
    4. Steen MAGNUSSEN, 2018. "An estimation strategy to protect against over-estimating precision in a LiDAR-based prediction of a stand mean," Journal of Forest Science, Czech Academy of Agricultural Sciences, vol. 64(12), pages 497-505.
    5. Nikita Moshkov & Michael Bornholdt & Santiago Benoit & Matthew Smith & Claire McQuin & Allen Goodman & Rebecca A. Senft & Yu Han & Mehrtash Babadi & Peter Horvath & Beth A. Cimini & Anne E. Carpenter , 2024. "Learning representations for image-based profiling of perturbations," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    6. Harold Doran, 2023. "A Collection of Numerical Recipes Useful for Building Scalable Psychometric Applications," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 37-69, February.
    7. Schosser, Josef, 2019. "Consistency between principal and agent with differing time horizons: Computing incentives under risk," European Journal of Operational Research, Elsevier, vol. 277(3), pages 1113-1123.
    8. Damiano Brigo & Xiaoshan Huang & Andrea Pallavicini & Haitz Saez de Ocariz Borde, 2021. "Interpretability in deep learning for finance: a case study for the Heston model," Papers 2104.09476, arXiv.org.
    9. Priddle, Jacob W. & Drovandi, Christopher, 2023. "Transformations in semi-parametric Bayesian synthetic likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    10. Stan Lipovetsky, 2022. "Canonical Concordance Correlation Analysis," Mathematics, MDPI, vol. 11(1), pages 1-12, December.
    11. Minati, Ludovico & Li, Chao & Bartels, Jim & Chakraborty, Parthojit & Li, Zixuan & Yoshimura, Natsue & Frasca, Mattia & Ito, Hiroyuki, 2023. "Accelerometer time series augmentation through externally driving a non-linear dynamical system," Chaos, Solitons & Fractals, Elsevier, vol. 168(C).
    12. Wong, William & Tsuchiya, Naotsugu, 2020. "Evidence accumulation clustering using combinations of features," OSF Preprints epb6t, Center for Open Science.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:amstat:v:72:y:2018:i:4:p:309-314. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/UTAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.