IDEAS home Printed from https://ideas.repec.org/a/spr/topjnl/v32y2024i3d10.1007_s11750-024-00666-y.html
   My bibliography  Save this article

Predicting the demographics of Twitter users with programmatic weak supervision

Author

Listed:
  • Jonathan Tonglet

    (KU Leuven)

  • Astrid Jehoul

    (Datashift)

  • Manon Reusens

    (KU Leuven)

  • Michael Reusens

    (Statistics Flanders)

  • Bart Baesens

    (KU Leuven
    University of Southampton)

Abstract

Predicting the demographics of Twitter users has become a problem with a large interest in computational social sciences. However, the limited amount of public datasets with ground truth labels and the tremendous costs of hand-labeling make this task particularly challenging. Recently, programmatic weak supervision has emerged as a new framework to train classifiers on noisy data with minimal human labeling effort. In this paper, demographic prediction is framed for the first time as a programmatic weak supervision problem. A new three-step methodology for gender, age category, and location prediction is provided, which outperforms traditional programmatic weak supervision and is competitive with the state-of-the-art deep learning model. The study is performed in Flanders, a small Dutch-speaking European region, characterized by a limited number of user profiles and tweets. An evaluation conducted on an independent hand-labeled test set shows that the proposed methodology can be generalized to unseen users within the geographic area of interest.

Suggested Citation

  • Jonathan Tonglet & Astrid Jehoul & Manon Reusens & Michael Reusens & Bart Baesens, 2024. "Predicting the demographics of Twitter users with programmatic weak supervision," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(3), pages 354-390, October.
  • Handle: RePEc:spr:topjnl:v:32:y:2024:i:3:d:10.1007_s11750-024-00666-y
    DOI: 10.1007/s11750-024-00666-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11750-024-00666-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11750-024-00666-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sandra C Matz & Jochen I Menges & David J Stillwell & H Andrew Schwartz, 2019. "Predicting individual-level income from Facebook profiles," PLOS ONE, Public Library of Science, vol. 14(3), pages 1-13, March.
    2. Daniel Preoţiuc-Pietro & Svitlana Volkova & Vasileios Lampos & Yoram Bachrach & Nikolaos Aletras, 2015. "Studying User Income through Language, Behaviour and Affect in Social Media," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-17, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Emilio Carrizosa & Dolores Romero Morales, 2024. "Guest editorial to the Special Issue on Machine Learning and Mathematical Optimization in TOP-Transactions in Operations Research," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(3), pages 351-353, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sandra C Matz & Jochen I Menges & David J Stillwell & H Andrew Schwartz, 2019. "Predicting individual-level income from Facebook profiles," PLOS ONE, Public Library of Science, vol. 14(3), pages 1-13, March.
    2. Yadi Zhu & Feng Chen & Ming Li & Zijia Wang, 2018. "Inferring the Economic Attributes of Urban Rail Transit Passengers Based on Individual Mobility Using Multisource Data," Sustainability, MDPI, vol. 10(11), pages 1-17, November.
    3. Min Liu & Sajid Anwar, 2024. "Analyzing horizontal integration and market efficiency in platform enterprises: A case study of exchanges," Economics and Politics, Wiley Blackwell, vol. 36(2), pages 1076-1089, July.
    4. Binfeng Shi, 2024. "Transmission mechanism of public concern in waste-sorting policy: Evidence from text mining," Energy & Environment, , vol. 35(3), pages 1616-1636, May.
    5. Andrea Bonaccorsi & Filippo Chiarello & Gualtiero Fantoni, 2021. "Impact for whom? Mapping the users of public research with lexicon-based text mining," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1745-1774, February.
    6. John Brandt & Kathleen Buckingham & Cody Buntain & Will Anderson & Sabin Ray & John-Rob Pool & Natasha Ferrari, 2020. "Identifying social media user demographics and topic diversity with computational social science: a case study of a major international policy forum," Journal of Computational Social Science, Springer, vol. 3(1), pages 167-188, April.
    7. Jordan Carpenter & Daniel Preotiuc-Pietro & Jenna Clark & Lucie Flekova & Laura Smith & Margaret L. Kern & Anneke Buffone & Lyle Ungar & Martin Seligman, 2018. "The impact of actively open-minded thinking on social media communication," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 13(6), pages 562-574, November.
    8. Bidur Devkota & Hiroyuki Miyazaki & Apichon Witayangkurn & Sohee Minsun Kim, 2019. "Using Volunteered Geographic Information and Nighttime Light Remote Sensing Data to Identify Tourism Areas of Interest," Sustainability, MDPI, vol. 11(17), pages 1-29, August.
    9. Jacob Levy Abitbol & Eric Fleury & Márton Karsai, 2019. "Optimal Proxy Selection for Socioeconomic Status Inference on Twitter," Complexity, Hindawi, vol. 2019, pages 1-15, May.
    10. repec:cup:judgdm:v:13:y:2018:i:6:p:562-574 is not listed on IDEAS
    11. Erik Hermann, 2022. "Leveraging Artificial Intelligence in Marketing for Social Good—An Ethical Perspective," Journal of Business Ethics, Springer, vol. 179(1), pages 43-61, August.
    12. Yuh-Jen Chen & Yuh-Min Chen & Yu-Jen Hsu & Jyun-Han Wu, 2019. "Predicting Consumers’ Decision-Making Styles by Analyzing Digital Footprints on Facebook," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(02), pages 601-627, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:topjnl:v:32:y:2024:i:3:d:10.1007_s11750-024-00666-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.