IDEAS home Printed from https://ideas.repec.org/a/zbw/espost/250905.html
   My bibliography  Save this article

Greasing the wheels for comparative communication research: Supervised text classification for multilingual corpora

Author

Listed:
  • Lind, Fabienne
  • Heidenreich, Tobias
  • Kralj, Christoph
  • Boomgaarden, Hajo G.

Abstract

Employing supervised machine learning for text classification is already a resource-intensive endeavor in a monolingual setting. However, facing the challenge to classify a multilingual corpus, the cost of producing the required annotated documents quickly exceeds even generous time and financial constraints. We show how tools like automated annotation and machine translation can not only efficiently but also effectively be employed for the classification of a multilingual corpus with supervised machine learning. Our findings demonstrate that good results can already be achieved with the machine translation of about 250 to 350 documents per category class and language and a dictionary in just one language, which we perceive as a realistic scenario for many projects. The methodological strategy is applied to study migration frames in seven languages (news discourse in seven European countries) and discussed and evaluated for its usability in comparative communication research.

Suggested Citation

  • Lind, Fabienne & Heidenreich, Tobias & Kralj, Christoph & Boomgaarden, Hajo G., 2021. "Greasing the wheels for comparative communication research: Supervised text classification for multilingual corpora," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 3(3), pages 1-30.
  • Handle: RePEc:zbw:espost:250905
    DOI: 10.5117/CCR2021.3.001.LIND
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/250905/1/Full-text-article-Lind-et-al-Greasing-the-wheels.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.5117/CCR2021.3.001.LIND?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sebők, Miklós & Kacsuk, Zoltán, 2021. "The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach," Political Analysis, Cambridge University Press, vol. 29(2), pages 236-249, April.
    2. Michael Scharkow, 2013. "Thematic content analysis using supervised machine learning: An empirical evaluation using German online news," Quality & Quantity: International Journal of Methodology, Springer, vol. 47(2), pages 761-773, February.
    3. Chang, Charles & Masterson, Michael, 2020. "Using Word Order in Political Text Classification with Long Short-term Memory Models," Political Analysis, Cambridge University Press, vol. 28(3), pages 395-411, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hauke Licht & Ronja Sczepanski & Moritz Laurer & Ayjeren Bekmuratovna, 2024. "No More Cost in Translation: Validating Open-Source Machine Translation for Quantitative Text Analysis," ECONtribute Discussion Papers Series 276, University of Bonn and University of Cologne, Germany.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eyal Eckhaus & Zachary Sheaffer, 2018. "Managerial hubris detection: the case of Enron," Risk Management, Palgrave Macmillan, vol. 20(4), pages 304-325, November.
    2. A. E. Opperhuizen & K. Schouten, 2021. "Dynamics and tipping point of issue attention in newspapers: quantitative and qualitative content analysis at sentence level in a longitudinal study using supervised machine learning and big data," Quality & Quantity: International Journal of Methodology, Springer, vol. 55(1), pages 19-37, February.
    3. Junke Chen & Yifan Liu & Qigang Zhu, 2022. "Enterprise Profitability and Financial Evaluation Model Based on Statistical Modeling: Taking Tencent Music as an Example," Mathematics, MDPI, vol. 10(12), pages 1-17, June.
    4. Yu Lim Lee & Minji Jung & Robert Jeyakumar Nathan & Jae-Eun Chung, 2020. "Cross-National Study on the Perception of the Korean Wave and Cultural Hybridity in Indonesia and Malaysia Using Discourse on Social Media," Sustainability, MDPI, vol. 12(15), pages 1-33, July.
    5. Miklos Sebők & Zoltán Kacsuk & Ákos Máté, 2022. "The (real) need for a human touch: testing a human–machine hybrid topic classification workflow on a New York Times corpus," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(5), pages 3621-3643, October.
    6. Triss Ashton & Nicholas Evangelopoulos & Victor Prybutok, 2015. "Quantitative quality control from qualitative data: control charts with latent semantic analysis," Quality & Quantity: International Journal of Methodology, Springer, vol. 49(3), pages 1081-1099, May.
    7. Damien Spry & Tim Dwyer, 2017. "Representations of Australia in South Korean online news: a qualitative and quantitative approach utilizing Leximancer and Korean keywords in context," Quality & Quantity: International Journal of Methodology, Springer, vol. 51(3), pages 1045-1064, May.
    8. Anton Oleinik, 2024. "A Bayesian index of association: comparison with other measures and performance," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(1), pages 277-305, February.
    9. Olessia Y. Koltsova & Sergei V. Pashakhin, 2017. "Agenda Divergence in a Developing Conflict: A Quantitative Evidence from a Ukrainian and a Russian TV Newsfeeds," HSE Working papers WP BRP 79/SOC/2017, National Research University Higher School of Economics.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:espost:250905. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/zbwkide.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.