IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0220976.html
   My bibliography  Save this article

Word2vec convolutional neural networks for classification of news articles and tweets

Author

Listed:
  • Beakcheol Jang
  • Inhwan Kim
  • Jong Wook Kim

Abstract

Big web data from sources including online news and Twitter are good resources for investigating deep learning. However, collected news articles and tweets almost certainly contain data unnecessary for learning, and this disturbs accurate learning. This paper explores the performance of word2vec Convolutional Neural Networks (CNNs) to classify news articles and tweets into related and unrelated ones. Using two word embedding algorithms of word2vec, Continuous Bag-of-Word (CBOW) and Skip-gram, we constructed CNN with the CBOW model and CNN with the Skip-gram model. We measured the classification accuracy of CNN with CBOW, CNN with Skip-gram, and CNN without word2vec models for real news articles and tweets. The experimental results indicated that word2vec significantly improved the accuracy of the classification model. The accuracy of the CBOW model was higher and more stable when compared to that of the Skip-gram model. The CBOW model exhibited better performance on news articles, and the Skip-gram model exhibited better performance on tweets. Specifically, CNN with word2vec models was more effective on news articles when compared to that on tweets because news articles are typically more uniform when compared to tweets.

Suggested Citation

  • Beakcheol Jang & Inhwan Kim & Jong Wook Kim, 2019. "Word2vec convolutional neural networks for classification of news articles and tweets," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-20, August.
  • Handle: RePEc:plo:pone00:0220976
    DOI: 10.1371/journal.pone.0220976
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0220976
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0220976&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0220976?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Manal Mohammed & Nazlia Omar, 2020. "Question classification based on Bloom’s taxonomy cognitive domain using modified TF-IDF and word2vec," PLOS ONE, Public Library of Science, vol. 15(3), pages 1-21, March.
    2. Guberney Muñetón-Santa & Daniel Escobar-Grisales & Felipe Orlando López-Pabón & Paula Andrea Pérez-Toro & Juan Rafael Orozco-Arroyave, 2022. "Classification of Poverty Condition Using Natural Language Processing," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 162(3), pages 1413-1435, August.
    3. Yasheng Chen & Xian Huang & Zhuojun Wu, 2023. "From natural language to accounting entries using a natural language processing method," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 63(4), pages 3781-3795, December.
    4. Ma, Yuanyuan & Zhang, Pingping & Duan, Shaodong & Zhang, Tianjie, 2023. "Credit default prediction of Chinese real estate listed companies based on explainable machine learning," Finance Research Letters, Elsevier, vol. 58(PA).
    5. Jorge A. V. Tohalino & Thiago C. Silva & Diego R. Amancio, 2024. "Using word embedding to detect keywords in texts modeled as complex networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(7), pages 3599-3623, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0220976. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.