IDEAS home Printed from https://ideas.repec.org/a/taf/japsta/v43y2016i6p1140-1154.html
   My bibliography  Save this article

A novel feature selection scheme for high-dimensional data sets: four-Staged Feature Selection

Author

Listed:
  • Ayça Çakmak Pehlivanlı

Abstract

Classification of high-dimensional data set is a big challenge for statistical learning and data mining algorithms. To effectively apply classification methods to high-dimensional data sets, feature selection is an indispensable pre-processing step of learning process. In this study, we consider the problem of constructing an effective feature selection and classification scheme for data set which has a small number of sample size with a large number of features. A novel feature selection approach, named four-Staged Feature Selection, has been proposed to overcome high-dimensional data classification problem by selecting informative features. The proposed method first selects candidate features with number of filtering methods which are based on different metrics, and then it applies semi-wrapper, union and voting stages, respectively, to obtain final feature subsets. Several statistical learning and data mining methods have been carried out to verify the efficiency of the selected features. In order to test the adequacy of the proposed method, 10 different microarray data sets are employed due to their high number of features and small sample size.

Suggested Citation

  • Ayça Çakmak Pehlivanlı, 2016. "A novel feature selection scheme for high-dimensional data sets: four-Staged Feature Selection," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(6), pages 1140-1154, May.
  • Handle: RePEc:taf:japsta:v:43:y:2016:i:6:p:1140-1154
    DOI: 10.1080/02664763.2015.1092112
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/02664763.2015.1092112
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/02664763.2015.1092112?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Scott L. Pomeroy & Pablo Tamayo & Michelle Gaasenbeek & Lisa M. Sturla & Michael Angelo & Margaret E. McLaughlin & John Y. H. Kim & Liliana C. Goumnerova & Peter M. Black & Ching Lau & Jeffrey C. Alle, 2002. "Prediction of central nervous system embryonal tumour outcome based on gene expression," Nature, Nature, vol. 415(6870), pages 436-442, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Outi Ruusunen & Marja Jalli & Lauri Jauhiainen & Mika Ruusunen & Kauko Leiviskä, 2022. "Identification of Optimal Starting Time Instance to Forecast Net Blotch Density in Spring Barley with Meteorological Data in Finland," Agriculture, MDPI, vol. 12(11), pages 1-16, November.
    2. Dong, Kai & Pang, Herbert & Tong, Tiejun & Genton, Marc G., 2016. "Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 127-142.
    3. Michelle M. Kameda-Smith & Helen Zhu & En-Ching Luo & Yujin Suk & Agata Xella & Brian Yee & Chirayu Chokshi & Sansi Xing & Frederick Tan & Raymond G. Fox & Ashley A. Adile & David Bakhshinyan & Kevin , 2022. "Characterization of an RNA binding protein interactome reveals a context-specific post-transcriptional landscape of MYC-amplified medulloblastoma," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    4. Ghosh, Santu & Ayyala, Deepak Nag & Hellebuyck, Rafael, 2021. "Two-sample high dimensional mean test based on prepivots," Computational Statistics & Data Analysis, Elsevier, vol. 163(C).
    5. Tianming Zhu & Jin-Ting Zhang, 2022. "Linear hypothesis testing in high-dimensional one-way MANOVA: a new normal reference approach," Computational Statistics, Springer, vol. 37(1), pages 1-27, March.
    6. Allison A. Appleton & Kevin C. Kiley & Lawrence M. Schell & Elizabeth A. Holdsworth & Anuoluwapo Akinsanya & Catherine Beecher, 2021. "Prenatal Lead and Depression Exposures Jointly Influence Birth Outcomes and NR3C1 DNA Methylation," IJERPH, MDPI, vol. 18(22), pages 1-15, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:43:y:2016:i:6:p:1140-1154. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.