IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v167y2018icp269-283.html
   My bibliography  Save this article

A U-classifier for high-dimensional data under non-normality

Author

Listed:
  • Rauf Ahmad, M.
  • Pavlenko, Tatjana

Abstract

A classifier for two or more samples is proposed when the data are high-dimensional and the distributions may be non-normal. The classifier is constructed as a linear combination of two easily computable and interpretable components, the U-component and theP-component. The U-component is a linear combination of U-statistics of bilinear forms of pairwise distinct vectors from independent samples. The P-component, the discriminant score, is a function of the projection of the U-component on the observation to be classified. Together, the two components constitute an inherently bias-adjusted classifier valid for high-dimensional data. The classifier is linear but its linearity does not rest on the assumption of homoscedasticity. Properties of the classifier and its normal limit are given under mild conditions. Misclassification errors and asymptotic properties of their empirical counterparts are discussed. Simulation results are used to show the accuracy of the proposed classifier for small or moderate sample sizes and large dimensions. Applications involving real data sets are also included.

Suggested Citation

  • Rauf Ahmad, M. & Pavlenko, Tatjana, 2018. "A U-classifier for high-dimensional data under non-normality," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 269-283.
  • Handle: RePEc:eee:jmvana:v:167:y:2018:i:c:p:269-283
    DOI: 10.1016/j.jmva.2018.05.008
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X17305821
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2018.05.008?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zhong, Ping-Shou & Chen, Song Xi, 2011. "Tests for High-Dimensional Regression Coefficients With Factorial Designs," Journal of the American Statistical Association, American Statistical Association, vol. 106(493), pages 260-274.
    2. Tae Kim & Zhi-Ming Luo & Chiho Kim, 2011. "The central limit theorem for degenerate variable -statistics under dependence," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 23(3), pages 683-699.
    3. Makoto Aoshima & Kazuyoshi Yata, 2014. "A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(5), pages 983-1010, October.
    4. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    5. Ruiyan Luo & Xin Qi, 2017. "Asymptotic Optimality of Sparse Linear Discriminant Analysis with Arbitrary Number of Classes," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 44(3), pages 598-616, September.
    6. M. Rauf Ahmad, 2017. "Location-invariant Multi-sample U-tests for Covariance Matrices with Large Dimension," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 44(2), pages 500-523, June.
    7. Song Huang & Tiejun Tong & Hongyu Zhao, 2010. "Bias-Corrected Diagonal Discriminant Rules for High-Dimensional Classification," Biometrics, The International Biometric Society, vol. 66(4), pages 1096-1106, December.
    8. Ning Hao & Bin Dong & Jianqing Fan, 2015. "Sparsifying the Fisher linear discriminant by rotation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 77(4), pages 827-851, September.
    9. Peter Hall & Yvonne Pittelkow & Malay Ghosh, 2008. "Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 159-173, February.
    10. Pinheiro, Aluísio & Sen, Pranab Kumar & Pinheiro, Hildete Prisco, 2009. "Decomposability of high-dimensional diversity measures: Quasi-U-statistics, martingales and nonstandard asymptotics," Journal of Multivariate Analysis, Elsevier, vol. 100(8), pages 1645-1656, September.
    11. Mikosch, T., 1993. "A Weak Invariance Principle for Weighted U-Statistics with Varying Kernels," Journal of Multivariate Analysis, Elsevier, vol. 47(1), pages 82-102, October.
    12. M. Ahmad, 2014. "A $$U$$ -statistic approach for a high-dimensional two-sample mean testing problem under non-normality and Behrens–Fisher setting," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(1), pages 33-61, February.
    13. Yao-Ban Chan & Peter Hall, 2009. "Scale adjustments for classifiers in high-dimensional, low sample size settings," Biometrika, Biometrika Trust, vol. 96(2), pages 469-478.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Makoto Aoshima & Kazuyoshi Yata, 2014. "A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(5), pages 983-1010, October.
    2. Makoto Aoshima & Kazuyoshi Yata, 2019. "High-Dimensional Quadratic Classifiers in Non-sparse Settings," Methodology and Computing in Applied Probability, Springer, vol. 21(3), pages 663-682, September.
    3. Makoto Aoshima & Kazuyoshi Yata, 2019. "Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(3), pages 473-503, June.
    4. Yugo Nakayama & Kazuyoshi Yata & Makoto Aoshima, 2020. "Bias-corrected support vector machine with Gaussian kernel in high-dimension, low-sample-size settings," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(5), pages 1257-1286, October.
    5. Ishii, Aki & Yata, Kazuyoshi & Aoshima, Makoto, 2022. "Geometric classifiers for high-dimensional noisy data," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    6. Rauf Ahmad, M., 2019. "A significance test of the RV coefficient in high dimensions," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 116-130.
    7. Zongliang Hu & Tiejun Tong & Marc G. Genton, 2019. "Diagonal likelihood ratio test for equality of mean vectors in high‐dimensional data," Biometrics, The International Biometric Society, vol. 75(1), pages 256-267, March.
    8. Nakagawa, Tomoyuki & Watanabe, Hiroki & Hyodo, Masashi, 2021. "Kick-one-out-based variable selection method for Euclidean distance-based classifier in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    9. Xu, Kai & Hao, Xinxin, 2019. "A nonparametric test for block-diagonal covariance structure in high dimension and small samples," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 551-567.
    10. Long Feng & Changliang Zou & Zhaojun Wang, 2016. "Multivariate-Sign-Based High-Dimensional Tests for the Two-Sample Location Problem," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 721-735, April.
    11. Ahmad, Rauf, 2022. "Tests for proportionality of matrices with large dimension," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    12. Kubokawa, Tatsuya & Srivastava, Muni S., 2008. "Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 99(9), pages 1906-1928, October.
    13. Egashira, Kento & Yata, Kazuyoshi & Aoshima, Makoto, 2024. "Asymptotic properties of hierarchical clustering in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 199(C).
    14. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    15. Luca Scrucca, 2014. "Graphical tools for model-based mixture discriminant analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(2), pages 147-165, June.
    16. Bilin Zeng & Xuerong Meggie Wen & Lixing Zhu, 2017. "A link-free sparse group variable selection method for single-index model," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(13), pages 2388-2400, October.
    17. Watanabe, Hiroki & Hyodo, Masashi & Seo, Takashi & Pavlenko, Tatjana, 2015. "Asymptotic properties of the misclassification rates for Euclidean Distance Discriminant rule in high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 140(C), pages 234-244.
    18. J. Burez & D. Van Den Poel, 2005. "CRM at a Pay-TV Company: Using Analytical Models to Reduce Customer Attrition by Targeted Marketing for Subscription Services," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 05/348, Ghent University, Faculty of Economics and Business Administration.
    19. Won, Joong-Ho & Lim, Johan & Yu, Donghyeon & Kim, Byung Soo & Kim, Kyunga, 2014. "Monotone false discovery rate," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 86-93.
    20. Jan, Budczies & Kosztyla, Daniel & von Törne, Christian & Stenzinger, Albrecht & Darb-Esfahani, Silvia & Dietel, Manfred & Denkert, Carsten, 2014. "cancerclass: An R Package for Development and Validation of Diagnostic Tests from High-Dimensional Molecular Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 59(i01).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:167:y:2018:i:c:p:269-283. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.