IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v71y2009i4p783-803.html
   My bibliography  Save this article

Tilting methods for assessing the influence of components in a classifier

Author

Listed:
  • Peter Hall
  • D. M. Titterington
  • Jing‐Hao Xue

Abstract

Summary. Many contemporary classifiers are constructed to provide good performance for very high dimensional data. However, an issue that is at least as important as good classification is determining which of the many potential variables provide key information for good decisions. Responding to this issue can help us to determine which aspects of the datagenerating mechanism (e.g. which genes in a genomic study) are of greatest importance in terms of distinguishing between populations. We introduce tilting methods for addressing this problem. We apply weights to the components of data vectors, rather than to the data vectors themselves (as is commonly the case in related work). In addition we tilt in a way that is governed by L2‐distance between weight vectors, rather than by the more commonly used Kullback–Leibler distance. It is shown that this approach, together with the added constraint that the weights should be non‐negative, produces an algorithm which eliminates vector components that have little influence on the classification decision. In particular, use of the L2‐distance in this problem produces properties that are reminiscent of those that arise when L1‐penalties are employed to eliminate explanatory variables in very high dimensional prediction problems, e.g. those involving the lasso. We introduce techniques that can be implemented very rapidly, and we show how to use bootstrap methods to assess the accuracy of our variable ranking and variable elimination procedures.

Suggested Citation

  • Peter Hall & D. M. Titterington & Jing‐Hao Xue, 2009. "Tilting methods for assessing the influence of components in a classifier," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(4), pages 783-803, September.
  • Handle: RePEc:bla:jorssb:v:71:y:2009:i:4:p:783-803
    DOI: 10.1111/j.1467-9868.2009.00701.x
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/j.1467-9868.2009.00701.x
    Download Restriction: no

    File URL: https://libkey.io/10.1111/j.1467-9868.2009.00701.x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hazelton, Martin L. & Turlach, Berwin A., 2007. "Reweighted kernel density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 51(6), pages 3057-3069, March.
    2. Smyth Gordon K, 2004. "Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-28, February.
    3. Peter Hall & Qiwei Yao, 2003. "Data tilting for time series," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 425-442, May.
    4. Stephen M. S. Lee, 2003. "Prepivoting by weighted bootstrap iteration," Biometrika, Biometrika Trust, vol. 90(2), pages 393-410, June.
    5. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    6. Frank Critchley, 2004. "Data-informed influence analysis," Biometrika, Biometrika Trust, vol. 91(1), pages 125-140, March.
    7. Hang Chan, Ngai & Deng, Shi-Jie & Peng, Liang & Xia, Zhendong, 2007. "Interval estimation of value-at-risk based on GARCH models with heavy-tailed innovations," Journal of Econometrics, Elsevier, vol. 137(2), pages 556-576, April.
    8. Frank Critchley & Richard A. Atkinson & Guobing Lu & Elenice Biazi, 2001. "Influence analysis based on the case sensitivity function," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 307-323.
    9. Opgen-Rhein Rainer & Strimmer Korbinian, 2007. "Accurate Ranking of Differentially Expressed Genes by a Distribution-Free Shrinkage Approach," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 6(1), pages 1-20, February.
    10. Peter Hall & Brett Presnell, 1999. "Biased Bootstrap Methods for Reducing the Effects of Contamination," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 661-680.
    11. Hall, Peter & Titterington, D. M. & Xue, Jing-Hao, 2009. "Median-Based Classifiers for High-Dimensional Data," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1597-1608.
    12. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    13. Lee, Yoonkyung & Lin, Yi & Wahba, Grace, 2004. "Multicategory Support Vector Machines: Theory and Application to the Classification of Microarray Data and Satellite Radiance Data," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 67-81, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhang, Jing & Wang, Qihua & Kang, Jian, 2020. "Feature screening under missing indicator imputation with non-ignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    2. Hall, Peter & Xue, Jing-Hao, 2014. "On selecting interacting features from high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 694-708.
    3. Xiangyu Wang & Chenlei Leng, 2016. "High dimensional ordinary least squares projection for screening variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(3), pages 589-611, June.
    4. Qinqin Hu & Lu Lin, 2017. "Conditional sure independence screening by conditional marginal empirical likelihood," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(1), pages 63-96, February.
    5. Liu, Zhongkai & Song, Rui & Zeng, Donglin & Zhang, Jiajia, 2017. "Principal components adjusted variable screening," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 134-144.
    6. Xin-Bing Kong & Zhi Liu & Yuan Yao & Wang Zhou, 2017. "Sure screening by ranking the canonical correlations," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(1), pages 46-70, March.
    7. Marc G. Genton & Peter Hall, 2016. "A tilting approach to ranking influence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 77-97, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marc G. Genton & Peter Hall, 2016. "A tilting approach to ranking influence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 77-97, January.
    2. Meng An & Haixiang Zhang, 2023. "High-Dimensional Mediation Analysis for Time-to-Event Outcomes with Additive Hazards Model," Mathematics, MDPI, vol. 11(24), pages 1-11, December.
    3. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    4. Zhaoyu Xing & Yang Wan & Juan Wen & Wei Zhong, 2024. "GOLFS: feature selection via combining both global and local information for high dimensional clustering," Computational Statistics, Springer, vol. 39(5), pages 2651-2675, July.
    5. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    6. Shi Chen & Wolfgang Karl Hardle & Brenda L'opez Cabrera, 2020. "Regularization Approach for Network Modeling of German Power Derivative Market," Papers 2009.09739, arXiv.org.
    7. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    8. Laurent Ferrara & Anna Simoni, 2023. "When are Google Data Useful to Nowcast GDP? An Approach via Preselection and Shrinkage," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 41(4), pages 1188-1202, October.
    9. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    10. Sangjin Kim & Jong-Min Kim, 2019. "Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data," Mathematics, MDPI, vol. 7(6), pages 1-16, May.
    11. Anders Bredahl Kock, 2012. "On the Oracle Property of the Adaptive Lasso in Stationary and Nonstationary Autoregressions," CREATES Research Papers 2012-05, Department of Economics and Business Economics, Aarhus University.
    12. Tang, Yanlin & Song, Xinyuan & Wang, Huixia Judy & Zhu, Zhongyi, 2013. "Variable selection in high-dimensional quantile varying coefficient models," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 115-132.
    13. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    14. Li, Xinyi & Wang, Li & Nettleton, Dan, 2019. "Sparse model identification and learning for ultra-high-dimensional additive partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 204-228.
    15. Li, Peili & Jiao, Yuling & Lu, Xiliang & Kang, Lican, 2022. "A data-driven line search rule for support recovery in high-dimensional data analysis," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    16. Jingyuan Liu & Runze Li & Rongling Wu, 2014. "Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(505), pages 266-274, March.
    17. Jianqing Fan & Yang Feng & Jiancheng Jiang & Xin Tong, 2016. "Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 275-287, March.
    18. Lee, Ji Hyung & Shi, Zhentao & Gao, Zhan, 2022. "On LASSO for predictive regression," Journal of Econometrics, Elsevier, vol. 229(2), pages 322-349.
    19. Ian W. McKeague & Min Qian, 2015. "An Adaptive Resampling Test for Detecting the Presence of Significant Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1422-1433, December.
    20. Victor Chernozhukov & Christian Hansen & Yuan Liao, 2015. "A lava attack on the recovery of sums of dense and sparse signals," CeMMAP working papers CWP56/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:71:y:2009:i:4:p:783-803. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.