IDEAS home Printed from https://ideas.repec.org/p/men/wpaper/65_2016.html
   My bibliography  Save this paper

Ensembles of Classifiers for Parallel Categorization of Large Number of Text Documents Expressing Opinions

Author

Listed:
  • Frantisek Darena

    (Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemedelska 1, 613 00 Brno, Czech Republic)

  • Jan Zizka

    (Department of Informatics, Faculty of Business and Economics, Mendel Uni- versity in Brno, Zemedelska 1, 613 00 Brno, Czech Republic)

Abstract

Opinions provided by people that used some services or purchased some goods are a rich source of knowledge. The opinion classification, applying mostly supervised classifiers, is one of the essential tasks. Computer’s technological capabilities are still a major obstacle, especially when processing huge volumes of data. This study proposes and evaluates experimentally a parallelism application to the classification of a very large number of contrary opinions expressed as freely written text reviews. Instead of training a single classifier on the entire data set, an ensemble of classifiers is trained on disjunctive subsets of data and a group decision is used for the classification of unlabelled items. The main assessment criteria are computational efficiency and error rates, combined into a single measure to be able to compare ensembles of different sizes. Support vector machines, artificial neural networks, and deci- sion trees, belonging to frequently used classification methods, were examined. The paper demonstrates the suggested method viability when the number of text reviews leads to com- putational complexity, which is beyond the contemporary common PC’s capabilities. Classification accuracy and the values of other classification performance measures (Precision, Recall, F-measure) did not decrease, which is a positive finding.

Suggested Citation

  • Frantisek Darena & Jan Zizka, 2016. "Ensembles of Classifiers for Parallel Categorization of Large Number of Text Documents Expressing Opinions," MENDELU Working Papers in Business and Economics 2016-65, Mendel University in Brno, Faculty of Business and Economics.
  • Handle: RePEc:men:wpaper:65_2016
    as

    Download full text from publisher

    File URL: http://ftp.mendelu.cz/RePEc/men/wpaper/65_2016.pdf
    File Function: Full text
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Frantisek Darena & Jan Zizka, 2011. "Approaches to samples selection for machine learning based classification of textual data," MENDELU Working Papers in Business and Economics 2011-11, Mendel University in Brno, Faculty of Business and Economics.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      More about this item

      Keywords

      text documents; natural language; classification; parallel processing; ensembles of classifiers; machine learning;
      All these keywords.

      JEL classification:

      • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
      • C89 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other

      NEP fields

      This paper has been announced in the following NEP Reports:

      Statistics

      Access and download statistics

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:men:wpaper:65_2016. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Luděk Kouba (email available below). General contact details of provider: https://edirc.repec.org/data/femencz.html .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.