IDEAS home Printed from https://ideas.repec.org/a/spr/aodasc/v4y2017i3d10.1007_s40745-017-0106-3.html
   My bibliography  Save this article

A Rough Based Hybrid Binary PSO Algorithm for Flat Feature Selection and Classification in Gene Expression Data

Author

Listed:
  • Suresh Dara

    (B.V. Raju Inistitute of Technology)

  • Haider Banka

    (Indian Institute of Technology (ISM))

  • Chandra Sekhara Rao Annavarapu

    (Indian Institute of Technology (ISM))

Abstract

Feature selection in high dimensional data, particularly, in gene expression data, is one of the challenging task in bioinformatics due to the curse of dimensionality, data redundancy and noise values. In gene expression data, insignificant features causes poor classification, hence feature selection reduces feature subset, improving classification accuracy. Feature selection algorithms in gene expression data(such as filter based, wrapper based and hybrid methods) performing poor accuracy, where as few methods takes too much time to converge for an acceptable results. For example, in NSGA-II, over 10,000 generations, on an average, to converge in the search space. where it incurs increased computational time. Proposed rough based hybrid binary PSO algorithm, which uses a heuristic based fast processing strategy to reduce crude domain features by statistical elimination of redundant features and then discretized subsequently into a binary table, known as distinction table, in rough set theory. This distinction table is later used as input to evaluate and optimize the objectives functions i.e., to generate reduct in rough set theory. The proposed hybrid binary PSO is then used to tune the objective functions, to choose the most important features (i:e:reduct). The fitness function is used in such a way that it can reduce the cardinality of the features and at the same time, improve the classification performance as well. Results have been demonstrated to show the effectiveness of the proposed method, on existing three benchmark datasets (i.e. colon cancer, lymphoma and leukemia data), from literature.

Suggested Citation

  • Suresh Dara & Haider Banka & Chandra Sekhara Rao Annavarapu, 2017. "A Rough Based Hybrid Binary PSO Algorithm for Flat Feature Selection and Classification in Gene Expression Data," Annals of Data Science, Springer, vol. 4(3), pages 341-360, September.
  • Handle: RePEc:spr:aodasc:v:4:y:2017:i:3:d:10.1007_s40745-017-0106-3
    DOI: 10.1007/s40745-017-0106-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40745-017-0106-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40745-017-0106-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gastwirth, Joseph L, 1972. "The Estimation of the Lorenz Curve and Gini Index," The Review of Economics and Statistics, MIT Press, vol. 54(3), pages 306-316, August.
    2. Pawlak, Zdzislaw, 1997. "Rough set approach to knowledge-based decision support," European Journal of Operational Research, Elsevier, vol. 99(1), pages 48-57, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Igor Fedotenkov, 2020. "A Review of More than One Hundred Pareto-Tail Index Estimators," Statistica, Department of Statistics, University of Bologna, vol. 80(3), pages 245-299.
    2. Clarke, Philip & Van Ourti, Tom, 2010. "Calculating the concentration index when income is grouped," Journal of Health Economics, Elsevier, vol. 29(1), pages 151-157, January.
    3. Chotikapanich, Duangkamon & Griffiths, William E, 2002. "Estimating Lorenz Curves Using a Dirichlet Distribution," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(2), pages 290-295, April.
    4. Xiaofeng Lv & Gupeng Zhang & Guangyu Ren, 2017. "Gini index estimation for lifetime data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(2), pages 275-304, April.
    5. Modalsli, Jørgen, 2011. "Inequality and growth in the very long run: inferring inequality from data on social groups," Memorandum 11/2011, Oslo University, Department of Economics.
    6. Fernandez del Pozo, J. A. & Bielza, C. & Gomez, M., 2005. "A list-based compact representation for large decision tables management," European Journal of Operational Research, Elsevier, vol. 160(3), pages 638-662, February.
    7. Suryakant Yadav, 2021. "Progress of Inequality in Age at Death in India: Role of Adult Mortality," European Journal of Population, Springer;European Association for Population Studies, vol. 37(3), pages 523-550, July.
    8. Nijkamp, Peter & Poot, Jacques, 2015. "Cultural Diversity: A Matter of Measurement," IZA Discussion Papers 8782, Institute of Labor Economics (IZA).
    9. Tom Van Ourti & Philip Clarke, 2008. "The Bias of the Gini Coefficient due to Grouping," Tinbergen Institute Discussion Papers 08-095/3, Tinbergen Institute.
    10. Vladimir Hlasny, 2021. "Parametric representation of the top of income distributions: Options, historical evidence, and model selection," Journal of Economic Surveys, Wiley Blackwell, vol. 35(4), pages 1217-1256, September.
    11. Zaras, Kazimierz, 2001. "Rough approximation of a preference relation by a multi-attribute stochastic dominance for determinist and stochastic evaluation problems," European Journal of Operational Research, Elsevier, vol. 130(2), pages 305-314, April.
    12. Stephen Davies & Peter L. Ormosi, 2014. "The economic impact of cartels and anti-cartel enforcement," Working Paper series, University of East Anglia, Centre for Competition Policy (CCP) 2013-07v2, Centre for Competition Policy, University of East Anglia, Norwich, UK..
    13. Erreygers, Guido, 2009. "Correcting the Concentration Index," Journal of Health Economics, Elsevier, vol. 28(2), pages 504-515, March.
    14. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Risks, MDPI, vol. 6(2), pages 1-20, April.
    15. Zhu, Yongjun & Yan, Erjia, 2017. "Examining academic ranking and inequality in library and information science through faculty hiring networks," Journal of Informetrics, Elsevier, vol. 11(2), pages 641-654.
    16. Juan Luo & Bao-zhen Li, 2022. "Impact of Digital Financial Inclusion on Consumption Inequality in China," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 163(2), pages 529-553, September.
    17. Huang, Ding-wei, 2018. "Optimal distribution of science funding," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 502(C), pages 613-618.
    18. Sung-Shun Weng & Yang Liu & Juan Dai & Yen-Ching Chuang, 2020. "A Novel Improvement Strategy of Competency for Education for Sustainable Development (ESD) of University Teachers Based on Data Mining," Sustainability, MDPI, vol. 12(7), pages 1-18, March.
    19. Naudé, Wim & Amorós, José Ernesto & Cristi, Oscar, 2013. ""Romanticizing Penniless Entrepreneurs?" The Relationship between Start-Ups and Human Wellbeing across Countries," IZA Discussion Papers 7547, Institute of Labor Economics (IZA).
    20. Csörgö, Miklós & Zitikis, Ricardas, 1997. "On the rate of strong consistency of Lorenz curves," Statistics & Probability Letters, Elsevier, vol. 34(2), pages 113-121, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:aodasc:v:4:y:2017:i:3:d:10.1007_s40745-017-0106-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.