IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0155036.html
   My bibliography  Save this article

Multilingual Twitter Sentiment Classification: The Role of Human Annotators

Author

Listed:
  • Igor Mozetič
  • Miha Grčar
  • Jasmina Smailović

Abstract

What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered.

Suggested Citation

  • Igor Mozetič & Miha Grčar & Jasmina Smailović, 2016. "Multilingual Twitter Sentiment Classification: The Role of Human Annotators," PLOS ONE, Public Library of Science, vol. 11(5), pages 1-26, May.
  • Handle: RePEc:plo:pone00:0155036
    DOI: 10.1371/journal.pone.0155036
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0155036
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0155036&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0155036?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Fabiana Zollo & Petra Kralj Novak & Michela Del Vicario & Alessandro Bessi & Igor Mozetič & Antonio Scala & Guido Caldarelli & Walter Quattrociocchi, 2015. "Emotional Dynamics in the Age of Misinformation," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-22, September.
    2. Gabriele Ranco & Darko Aleksovski & Guido Caldarelli & Miha Grčar & Igor Mozetič, 2015. "The Effects of Twitter Sentiment on Stock Price Returns," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-21, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Vuk Batanović & Miloš Cvetanović & Boško Nikolić, 2020. "A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-30, November.
    2. Peter Gabrovšek & Darko Aleksovski & Igor Mozetič & Miha Grčar, 2017. "Twitter sentiment around the Earnings Announcement events," PLOS ONE, Public Library of Science, vol. 12(2), pages 1-21, February.
    3. Chansiri, Karikarn & Wei, Xinyu & Chor, Ka Ho Brian, 2024. "Using natural language processing approaches to characterize professional experiences of child welfare workers," Children and Youth Services Review, Elsevier, vol. 166(C).
    4. Paweł Matuszewski, 2023. "How to prepare data for the automatic classification of politically related beliefs expressed on Twitter? The consequences of researchers’ decisions on the number of coders, the algorithm learning pro," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(1), pages 301-321, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Darko Cherepnalkoski & Andreas Karpf & Igor Mozetič & Miha Grčar, 2016. "Cohesion and Coalition Formation in the European Parliament: Roll-Call Votes and Twitter Activities," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-27, November.
    2. Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Digital Finance, Springer, vol. 2(1), pages 1-13, September.
    3. Paola Cerchiello & Giancarlo Nicola, 2018. "Assessing News Contagion in Finance," Econometrics, MDPI, vol. 6(1), pages 1-19, February.
    4. Gabriele Ranco & Darko Aleksovski & Guido Caldarelli & Miha Grčar & Igor Mozetič, 2015. "The Effects of Twitter Sentiment on Stock Price Returns," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-21, September.
    5. Javier Gil-Bazo & Juan F. Imbet, 2022. "Tweeting for money: Social media and mutual fund flows," Economics Working Papers 1846, Department of Economics and Business, Universitat Pompeu Fabra.
    6. Sakariyahu, Rilwan & Lawal, Rodiat & Adigun, Rasheed & Paterson, Audrey & Johan, Sofia, 2024. "One crash, too many: Global uncertainty, sentiment factors and cryptocurrency market," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 94(C).
    7. Matteo Iacopini & Carlo R.M.A. Santagiustina, 2021. "Filtering the intensity of public concern from social media count data with jumps," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(4), pages 1283-1302, October.
    8. Chen, Long & Huang, Jiahui & Jing, Peng & Wang, Bichen & Yu, Xiaozhou & Zha, Ye & Jiang, Chengxi, 2023. "Changing or unchanging Chinese attitudes toward ride-hailing? A social media analytics perspective from 2018 to 2021," Transportation Research Part A: Policy and Practice, Elsevier, vol. 178(C).
    9. Frank Z. Xing & Erik Cambria & Lorenzo Malandri & Carlo Vercellis, 2018. "Discovering Bayesian Market Views for Intelligent Asset Allocation," Papers 1802.09911, arXiv.org, revised Jun 2018.
    10. Bessi, Alessandro, 2017. "On the statistical properties of viral misinformation in online social media," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 469(C), pages 459-470.
    11. Renault, Thomas, 2017. "Intraday online investor sentiment and return patterns in the U.S. stock market," Journal of Banking & Finance, Elsevier, vol. 84(C), pages 25-40.
    12. Kang, Le & Jiang, Han & Nie, Ziye Zoe & Zhou, Hui, 2024. "Can old sin make new shame? Stock market reactions to the release of movies re-exposing past corporate scandals," Finance Research Letters, Elsevier, vol. 67(PA).
    13. Paul A. Griffin & Mohammedi Padaria, 2017. "Is Financial Analysis Doomed? The Birth of “Reactive Valuation†Analysis," Accounting and Finance Research, Sciedu Press, vol. 6(3), pages 1-39, August.
    14. Muhammad Kamran Khan & Jian-Zhou Teng & Muhammad Imran Khan, 2019. "Asymmetric impact of oil prices on stock returns in Shanghai stock exchange: Evidence from asymmetric ARDL model," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-14, June.
    15. Stefan Claus & Massimo Stella, 2022. "Natural Language Processing and Cognitive Networks Identify UK Insurers’ Trends in Investor Day Transcripts," Future Internet, MDPI, vol. 14(10), pages 1-18, October.
    16. Klaus, Jürgen & Koser, Christoph, 2021. "Measuring Trump: The Volfefe Index and its impact on European financial markets," Finance Research Letters, Elsevier, vol. 38(C).
    17. Piñeiro-Chousa, Juan & López-Cabarcos, M.Ángeles & Caby, Jérôme & Šević, Aleksandar, 2021. "The influence of investor sentiment on the green bond market," Technological Forecasting and Social Change, Elsevier, vol. 162(C).
    18. Ana Fern'andez Vilas & Rebeca D'iaz Redondo & Ant'on Lorenzo Garc'ia, 2023. "The irruption of cryptocurrencies into Twitter cashtags: a classifying solution," Papers 2312.11531, arXiv.org.
    19. Ankur Sinha & Satishwar Kedas & Rishu Kumar & Pekka Malo, 2022. "SEntFiN 1.0: Entity‐aware sentiment analysis for financial news," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(9), pages 1314-1335, September.
    20. Jiang, Zhe & Zhang, Lin & Zhang, Lingling & Wen, Bo, 2022. "Investor sentiment and machine learning: Predicting the price of China's crude oil futures market," Energy, Elsevier, vol. 247(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0155036. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.