IDEAS home Printed from https://ideas.repec.org/a/spr/soinre/v169y2023i1d10.1007_s11205-023-03147-0.html
   My bibliography  Save this article

A Natural Language Processing Analysis of Newspapers Coverage of Hong Kong Protests Between 1998 and 2020

Author

Listed:
  • Giovanna Maria Dora Dore

    (Johns Hopkins University)

Abstract

This article investigates how the SCMP, the China Daily-and western-based newspapers cover protests in Hong Kong in an effort to identify changes in journalistic practices between 1998 and 2020. It combines natural language processing (NLP) with a qualitative investigation of a novel corpus of newspaper articles spanning 22 years. It enlists topic modeling to contrast the treatment of protests in Hong Kong diachronically and across news sources. Through comparison of lexical frequency and lexical usage it showcases preferences and discrepancies in the use of protest-relevant keywords in the newspapers’ articles. Embedding neighborhood comparisons strengthens our understanding of how words are used differently between the SCMP, the China Daily and western-based newspapers, and also how the context of protest-related keywords may differ across news sources over time. Finally, computational sentiment analysis measures the tone and connotations of articles. The article fills a gap in the literature on Hong Kong media and its methodology broadens the application of NLP techniques to the social sciences.

Suggested Citation

  • Giovanna Maria Dora Dore, 2023. "A Natural Language Processing Analysis of Newspapers Coverage of Hong Kong Protests Between 1998 and 2020," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 169(1), pages 143-166, September.
  • Handle: RePEc:spr:soinre:v:169:y:2023:i:1:d:10.1007_s11205-023-03147-0
    DOI: 10.1007/s11205-023-03147-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11205-023-03147-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11205-023-03147-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Daniel D. Lee & H. Sebastian Seung, 1999. "Learning the parts of objects by non-negative matrix factorization," Nature, Nature, vol. 401(6755), pages 788-791, October.
    2. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    3. Kevin M. Quinn & Burt L. Monroe & Michael Colaresi & Michael H. Crespin & Dragomir R. Radev, 2010. "How to Analyze Political Attention with Minimal Assumptions and Costs," American Journal of Political Science, John Wiley & Sons, vol. 54(1), pages 209-228, January.
    4. Grimmer, Justin & Stewart, Brandon M., 2013. "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, Cambridge University Press, vol. 21(3), pages 267-297, July.
    5. Gehlbach, Scott & Sonin, Konstantin, 2014. "Government control of the media," Journal of Public Economics, Elsevier, vol. 118(C), pages 163-171.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. van Loon, Austin, 2022. "Three Families of Automated Text Analysis," SocArXiv htnej, Center for Open Science.
    2. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    3. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    4. Mohamed M. Mostafa, 2023. "A one-hundred-year structural topic modeling analysis of the knowledge structure of international management research," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3905-3935, August.
    5. Yang Bao & Anindya Datta, 2014. "Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures," Management Science, INFORMS, vol. 60(6), pages 1371-1391, June.
    6. Dehler-Holland, Joris & Okoh, Marvin & Keles, Dogan, 2022. "Assessing technology legitimacy with topic models and sentiment analysis – The case of wind power in Germany," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    7. Gadat, Sébastien & Villeneuve, Stéphane, 2023. "Parsimonious Wasserstein Text-mining," TSE Working Papers 23-1471, Toulouse School of Economics (TSE).
    8. Lino Wehrheim, 2019. "Economic history goes digital: topic modeling the Journal of Economic History," Cliometrica, Springer;Cliometric Society (Association Francaise de Cliométrie), vol. 13(1), pages 83-125, January.
    9. D. Thorleuchter & D. Van Den Poel, 2013. "Weak Signal Identification with Semantic Web Mining," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 13/860, Ghent University, Faculty of Economics and Business Administration.
    10. Lehotský, Lukáš & Černoch, Filip & Osička, Jan & Ocelík, Petr, 2019. "When climate change is missing: Media discourse on coal mining in the Czech Republic," Energy Policy, Elsevier, vol. 129(C), pages 774-786.
    11. Elliott Ash & Germain Gauthier & Philine Widmer, 2021. "RELATIO: Text Semantics Capture Political and Economic Narratives," Papers 2108.01720, arXiv.org, revised Apr 2022.
    12. Born, Andreas & Janssen, Aljoscha, 2020. "Does a District-Vote Matter for the Behavior of Politicians? A Textual Analysis of Parliamentary Speeches," Working Paper Series 1320, Research Institute of Industrial Economics.
    13. Sanders James & Lisi Giulio & Schonhardt-Bailey Cheryl, 2017. "Themes and Topics in Parliamentary Oversight Hearings: A New Direction in Textual Data Analysis," Statistics, Politics and Policy, De Gruyter, vol. 8(2), pages 153-194, December.
    14. Diego Kozlowski & Viktoriya Semeshenko & Andrea Molinari, 2021. "Latent Dirichlet allocation model for world trade analysis," PLOS ONE, Public Library of Science, vol. 16(2), pages 1-18, February.
    15. Keith Carlson & Michael A. Livermore & Daniel N. Rockmore, 2020. "The Problem of Data Bias in the Pool of Published U.S. Appellate Court Opinions," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 17(2), pages 224-261, June.
    16. Romain Gauchon & Stéphane Loisel & Jean-Louis Rullière, 2020. "Health-policyholder clustering using health consumption," Post-Print hal-02156058, HAL.
    17. Justyna Klejdysz & Robin L. Lumsdaine, 2023. "Shifts in ECB Communication: A Textual Analysis of the Press Conference," International Journal of Central Banking, International Journal of Central Banking, vol. 19(2), pages 473-542, June.
    18. Oleg S. Nagornyy & Olessia Y. Koltsova, 2017. "Mining Media Topics Perceived as Social Problems by Online Audiences: Use of a Data Mining Approach in Sociology," HSE Working papers WP BRP 74/SOC/2017, National Research University Higher School of Economics.
    19. Fabienne Kiener & Ann-Sophie Gnehm & Simon Clematide & Uschi Backes-Gellner, 2019. "IT skills in vocational training curricula and labour market outcomes," Economics of Education Working Paper Series 0159, University of Zurich, Department of Business Administration (IBW), revised Sep 2022.
    20. LIM Jaehwan & ITO Asei & ZHANG Hongyong, 2023. "Policy Agenda and Trajectory of the Xi Jinping Administration: Textual Evidence from 2012 to 2022," Policy Discussion Papers 23008, Research Institute of Economy, Trade and Industry (RIETI).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:soinre:v:169:y:2023:i:1:d:10.1007_s11205-023-03147-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.