IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v71y2014icp694-708.html
   My bibliography  Save this article

On selecting interacting features from high-dimensional data

Author

Listed:
  • Hall, Peter
  • Xue, Jing-Hao

Abstract

For high-dimensional data, most feature-selection methods, such as SIS and the lasso, involve ranking and selecting features individually. These methods do not require many computational resources, but they ignore feature interactions. A simple recursive approach, which, without requiring many more computational resources, also allows identification of interactions, is investigated. This approach can lead to substantial improvements in the performance of classifiers, and can provide insight into the way in which features work together in a given population. It also enjoys attractive statistical properties.

Suggested Citation

  • Hall, Peter & Xue, Jing-Hao, 2014. "On selecting interacting features from high-dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 694-708.
  • Handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:694-708
    DOI: 10.1016/j.csda.2012.10.010
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016794731200360X
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2012.10.010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    2. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    3. Peter Hall & D. M. Titterington & Jing‐Hao Xue, 2009. "Tilting methods for assessing the influence of components in a classifier," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(4), pages 783-803, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Feng Li & Yajie Li & Sanying Feng, 2021. "Estimation for Varying Coefficient Models with Hierarchical Structure," Mathematics, MDPI, vol. 9(2), pages 1-18, January.
    2. Frénay, Benoît & Doquire, Gauthier & Verleysen, Michel, 2014. "Estimating mutual information for feature selection in the presence of label noise," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 832-848.
    3. Jun Lu & Dan Wang & Qinqin Hu, 2022. "Interaction screening via canonical correlation," Computational Statistics, Springer, vol. 37(5), pages 2637-2670, November.
    4. Xuewei Cheng & Gang Li & Hong Wang, 2024. "The concordance filter: an adaptive model-free feature screening procedure," Computational Statistics, Springer, vol. 39(5), pages 2413-2436, July.
    5. Xiong, Wei & Chen, Yaxian & Ma, Shuangge, 2023. "Unified model-free interaction screening via CV-entropy filter," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    6. Timothy I. Cannings & Richard J. Samworth, 2017. "Random-projection ensemble classification," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 959-1035, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jianqing Fan & Yang Feng & Jiancheng Jiang & Xin Tong, 2016. "Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 275-287, March.
    2. Zhang, Jing & Wang, Qihua & Kang, Jian, 2020. "Feature screening under missing indicator imputation with non-ignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    3. Dawit G. Tadesse & Mark Carpenter, 2019. "A method for selecting the relevant dimensions for high-dimensional classification in singular vector spaces," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(2), pages 405-426, June.
    4. Xin-Bing Kong & Zhi Liu & Yuan Yao & Wang Zhou, 2017. "Sure screening by ranking the canonical correlations," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(1), pages 46-70, March.
    5. Qinqin Hu & Lu Lin, 2017. "Conditional sure independence screening by conditional marginal empirical likelihood," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(1), pages 63-96, February.
    6. Liu, Zhongkai & Song, Rui & Zeng, Donglin & Zhang, Jiajia, 2017. "Principal components adjusted variable screening," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 134-144.
    7. Zeyu Diao & Lili Yue & Fanrong Zhao & Gaorong Li, 2022. "High-Dimensional Regression Adjustment Estimation for Average Treatment Effect with Highly Correlated Covariates," Mathematics, MDPI, vol. 10(24), pages 1-18, December.
    8. Wang, Cheng & Cao, Longbing & Miao, Baiqi, 2013. "Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 66(C), pages 140-149.
    9. Xiangyu Wang & Chenlei Leng, 2016. "High dimensional ordinary least squares projection for screening variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(3), pages 589-611, June.
    10. Meng An & Haixiang Zhang, 2023. "High-Dimensional Mediation Analysis for Time-to-Event Outcomes with Additive Hazards Model," Mathematics, MDPI, vol. 11(24), pages 1-11, December.
    11. Tomohiro Ando & Ruey S. Tsay, 2009. "Model selection for generalized linear models with factor‐augmented predictors," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 25(3), pages 207-235, May.
    12. Kubokawa, Tatsuya & Srivastava, Muni S., 2008. "Estimation of the precision matrix of a singular Wishart distribution and its application in high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 99(9), pages 1906-1928, October.
    13. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    14. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    15. Luca Scrucca, 2014. "Graphical tools for model-based mixture discriminant analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 8(2), pages 147-165, June.
    16. Jing Zhang & Qihua Wang & Xuan Wang, 2022. "Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(2), pages 379-397, April.
    17. Sauvenier, Mathieu & Van Bellegem, Sébastien, 2023. "Direction Identification and Minimax Estimation by Generalized Eigenvalue Problem in High Dimensional Sparse Regression," LIDAM Discussion Papers CORE 2023005, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    18. Jie-Huei Wang & Cheng-Yu Liu & You-Ruei Min & Zih-Han Wu & Po-Lin Hou, 2024. "Cancer Diagnosis by Gene-Environment Interactions via Combination of SMOTE-Tomek and Overlapped Group Screening Approaches with Application to Imbalanced TCGA Clinical and Genomic Data," Mathematics, MDPI, vol. 12(14), pages 1-24, July.
    19. Zhaoyu Xing & Yang Wan & Juan Wen & Wei Zhong, 2024. "GOLFS: feature selection via combining both global and local information for high dimensional clustering," Computational Statistics, Springer, vol. 39(5), pages 2651-2675, July.
    20. Ahmed Ismaïl & Hartikainen Anna-Liisa & Järvelin Marjo-Riitta & Richardson Sylvia, 2011. "False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-20, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:694-708. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.