IDEAS home Printed from https://ideas.repec.org/a/hin/jnddns/9963382.html
   My bibliography  Save this article

Application of an Improved CHI Feature Selection Algorithm

Author

Listed:
  • Liang-jing Cai
  • Shu Lv
  • Kai-bo Shi
  • Zi-Peng Wang

Abstract

Text classification is the critical content of machine learning, and it is widely applied in information filtering, sentimental analysis, and text review. It is very important to improve the accuracy of classification results, and this is also the main research purpose of researchers in this field in recent years. Feature selection plays an important role in text classification, which has the functions of eliminating irrelevant features, reducing dimensionality, and improving classification accuracy. So, this paper studies the CHI feature selection algorithm, and the main work and innovations are as follows: firstly, this paper analyzed the CHI algorithm’s flaws, determined that the introduction of new parameters will be the improvement direction of the CHI algorithm, and thus proposed a new algorithm based on variance and coefficient of variation. Secondly, experiment to verify the effectiveness of the new algorithm. In terms of language, the experiment in this paper includes two text classification systems, which were Chinese and English. In terms of classifiers, two classifier algorithms were used, which included the KNN classifier and the Naive Bayes classifier. In terms of data types, two distribution types of data were used: balanced datasets and unbalanced datasets. Finally, experiment and result analysis. This paper has conducted 3 comparative experiments and analyzed the results of each experiment. The experimental results obtained are all significantly improved compared to the results before the improvement.

Suggested Citation

  • Liang-jing Cai & Shu Lv & Kai-bo Shi & Zi-Peng Wang, 2021. "Application of an Improved CHI Feature Selection Algorithm," Discrete Dynamics in Nature and Society, Hindawi, vol. 2021, pages 1-8, May.
  • Handle: RePEc:hin:jnddns:9963382
    DOI: 10.1155/2021/9963382
    as

    Download full text from publisher

    File URL: http://downloads.hindawi.com/journals/ddns/2021/9963382.pdf
    Download Restriction: no

    File URL: http://downloads.hindawi.com/journals/ddns/2021/9963382.xml
    Download Restriction: no

    File URL: https://libkey.io/10.1155/2021/9963382?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:jnddns:9963382. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.