IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v125y2018icp70-85.html
   My bibliography  Save this article

Identifying outliers using multiple kernel canonical correlation analysis with application to imaging genetics

Author

Listed:
  • Alam, Md. Ashad
  • Calhoun, Vince D.
  • Wang, Yu-Ping

Abstract

Identifying significant outliers or atypical objects from multimodal datasets is an essential and challenging issue for biomedical research. This problem is addressed, using the influence function of multiple kernel canonical correlation analysis. First, the influence function (IF) of the kernel mean element, the kernel covariance operator, the kernel cross-covariance operator and kernel canonical correlation analysis (kernel CCA) are studied. Second, an IF of multiple kernel CCA is proposed, which can be applied to multimodal datasets. Third, a visualization method is proposed to detect influential observations of multiple sources of data based on the IF of kernel CCA and multiple kernel CCA. Finally, to validate the method, experiments on both synthesized and imaging genetics data (e.g., SNP, fMRI, and DNA methylation) are performed. To examine the outliers, both the stem-and-leaf display and distribution based technique are used. The performance of the proposed approach is illustrated on 116 candidate regions of interest (ROIs) from the fMRI data of schizophrenia study to identify significant ROIs. The proposed method and two state-of-the-art statistical methods have identified 8, 34, and 10 ROIs, respectively. Based on an online database, the brain mappings of the selected common 7 ROIs indicate the irregular brain regions susceptible to schizophrenia. The results demonstrate that the proposed method is capable of analyzing outliers and the influence of observations, and can be applicable to many other biomedical data which are often high-dimensional and multi-modal.

Suggested Citation

  • Alam, Md. Ashad & Calhoun, Vince D. & Wang, Yu-Ping, 2018. "Identifying outliers using multiple kernel canonical correlation analysis with application to imaging genetics," Computational Statistics & Data Analysis, Elsevier, vol. 125(C), pages 70-85.
  • Handle: RePEc:eee:csdana:v:125:y:2018:i:c:p:70-85
    DOI: 10.1016/j.csda.2018.03.013
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947318300732
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2018.03.013?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Debruyne, Michiel & Hubert, Mia & Van Horebeek, Johan, 2010. "Detecting influential observations in Kernel PCA," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3007-3019, December.
    2. Parkhomenko Elena & Tritchler David & Beyene Joseph, 2009. "Sparse Canonical Correlation Analysis with Application to Genomic Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-36, January.
    3. Kenji Fukumizu & Chenlei Leng, 2014. "Gradient-Based Kernel Dimension Reduction for Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 359-370, March.
    4. Filzmoser, Peter & Maronna, Ricardo & Werner, Mark, 2008. "Outlier identification in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1694-1711, January.
    5. Mario Romanazzi, 1992. "Influence in canonical correlation analysis," Psychometrika, Springer;The Psychometric Society, vol. 57(2), pages 237-259, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Melissa G Naylor & Xihong Lin & Scott T Weiss & Benjamin A Raby & Christoph Lange, 2010. "Using Canonical Correlation Analysis to Discover Genetic Regulatory Variants," PLOS ONE, Public Library of Science, vol. 5(5), pages 1-6, May.
    2. G. Zioutas & C. Chatzinakos & T. D. Nguyen & L. Pitsoulis, 2017. "Optimization techniques for multivariate least trimmed absolute deviation estimation," Journal of Combinatorial Optimization, Springer, vol. 34(3), pages 781-797, October.
    3. Szefer Elena & Lu Donghuan & Nathoo Farouk & Beg Mirza Faisal & Graham Jinko, 2017. "Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(5-6), pages 367-386, December.
    4. Wang, Wenjia & Zhou, Yi-Hui, 2021. "Eigenvector-based sparse canonical correlation analysis: Fast computation for estimation of multiple canonical vectors," Journal of Multivariate Analysis, Elsevier, vol. 185(C).
    5. Alberto Roverato & F. Marta L. Di Lascio, 2011. "Wilks' Λ Dissimilarity Measures for Gene Clustering: An Approach Based on the Identification of Transcription Modules," Biometrics, The International Biometric Society, vol. 67(4), pages 1236-1248, December.
    6. Jose A Seoane & Colin Campbell & Ian N M Day & Juan P Casas & Tom R Gaunt, 2014. "Canonical Correlation Analysis for Gene-Based Pleiotropy Discovery," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-13, October.
    7. Thomas Triebs & Subal C. Kumbhakar, 2012. "Management Practice in Production," ifo Working Paper Series 129, ifo Institute - Leibniz Institute for Economic Research at the University of Munich.
    8. David E. Tyler & Frank Critchley & Lutz Dümbgen & Hannu Oja, 2009. "Invariant co‐ordinate selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 549-592, June.
    9. Coleman Jacob & Replogle Joseph & Chandler Gabriel & Hardin Johanna, 2016. "Resistant multiple sparse canonical correlation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(2), pages 123-138, April.
    10. Strobl Eric V. & Visweswaran Shyam, 2016. "Markov Boundary Discovery with Ridge Regularized Linear Models," Journal of Causal Inference, De Gruyter, vol. 4(1), pages 31-48, March.
    11. Cizek, Pavel & Sadikoglu, Serhan, 2022. "Nonseparable Panel Models with Index Structure and Correlated Random Effects," Discussion Paper 2022-009, Tilburg University, Center for Economic Research.
    12. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    13. Mehni, Moien Barkhori & Mehni, Mohammad Barkhori, 2023. "Reliability analysis with cross-entropy based adaptive Markov chain importance sampling and control variates," Reliability Engineering and System Safety, Elsevier, vol. 231(C).
    14. Junlong Zhao & Chao Liu & Lu Niu & Chenlei Leng, 2019. "Multiple influential point detection in high dimensional regression spaces," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 385-408, April.
    15. Van Aelst, S. & Vandervieren, E. & Willems, G., 2012. "A Stahel–Donoho estimator based on huberized outlyingness," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 531-542.
    16. C. Chatzinakos & L. Pitsoulis & G. Zioutas, 2016. "Optimization techniques for robust multivariate location and scatter estimation," Journal of Combinatorial Optimization, Springer, vol. 31(4), pages 1443-1460, May.
    17. Chung, Hee Cheol & Ahn, Jeongyoun, 2021. "Subspace rotations for high-dimensional outlier detection," Journal of Multivariate Analysis, Elsevier, vol. 183(C).
    18. Iaci, Ross & Sriram, T.N., 2013. "Robust multivariate association and dimension reduction using density divergences," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 281-295.
    19. Shieh Albert D & Hung Yeung Sam, 2009. "Detecting Outlier Samples in Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-26, February.
    20. Adrover, Jorge G. & Donato, Stella M., 2015. "A robust predictive approach for canonical correlation analysis," Journal of Multivariate Analysis, Elsevier, vol. 133(C), pages 356-376.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:125:y:2018:i:c:p:70-85. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.