IDEAS home Printed from https://ideas.repec.org/a/hin/jnlmpe/246139.html
   My bibliography  Save this article

Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

Author

Listed:
  • Daniel Peralta
  • Sara del Río
  • Sergio Ramírez-Gallego
  • Isaac Triguero
  • Jose M. Benitez
  • Francisco Herrera

Abstract

Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a delimited time. This paper presents a feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets. The algorithm decomposes the original dataset in blocks of instances to learn from them in the map phase; then, the reduce phase merges the obtained partial results into a final vector of feature weights, which allows a flexible application of the feature selection procedure using a threshold to determine the selected subset of features. The feature selection method is evaluated by using three well-known classifiers (SVM, Logistic Regression, and Naive Bayes) implemented within the Spark framework to address big data problems. In the experiments, datasets up to 67 millions of instances and up to 2000 attributes have been managed, showing that this is a suitable framework to perform evolutionary feature selection, improving both the classification accuracy and its runtime when dealing with big data problems.

Suggested Citation

  • Daniel Peralta & Sara del Río & Sergio Ramírez-Gallego & Isaac Triguero & Jose M. Benitez & Francisco Herrera, 2015. "Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach," Mathematical Problems in Engineering, Hindawi, vol. 2015, pages 1-11, October.
  • Handle: RePEc:hin:jnlmpe:246139
    DOI: 10.1155/2015/246139
    as

    Download full text from publisher

    File URL: http://downloads.hindawi.com/journals/MPE/2015/246139.pdf
    Download Restriction: no

    File URL: http://downloads.hindawi.com/journals/MPE/2015/246139.xml
    Download Restriction: no

    File URL: https://libkey.io/10.1155/2015/246139?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:jnlmpe:246139. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.