IDEAS home Printed from https://ideas.repec.org/a/hin/complx/6059673.html
   My bibliography  Save this article

Optimal Proxy Selection for Socioeconomic Status Inference on Twitter

Author

Listed:
  • Jacob Levy Abitbol
  • Eric Fleury
  • Márton Karsai

Abstract

Individual socioeconomic status inference from online traces is a remarkably difficult task. While current methods commonly train predictive models on incomplete data by appending socioeconomic information of residential areas or professional occupation profiles, little attention has been paid to how well this information serves as a proxy for the individual demographic trait of interest when fed to a learning model. Here we address this question by proposing three different data collection and combination methods to first estimate and, in turn, infer the socioeconomic status of French Twitter users from their online semantics. We assess the validity of each proxy measure by analyzing the performance of our prediction pipeline when trained on these datasets. Despite having to rely on different user sets, we find that training our model on professional occupation provides better predictive performance than open census data or remote sensed expert annotation of habitual environments. Furthermore, we release the tools we developed in the hope it will provide a generalizable framework to estimate socioeconomic status of large numbers of Twitter users as well as contribute to the scientific discussion on social stratification and inequalities.

Suggested Citation

  • Jacob Levy Abitbol & Eric Fleury & Márton Karsai, 2019. "Optimal Proxy Selection for Socioeconomic Status Inference on Twitter," Complexity, Hindawi, vol. 2019, pages 1-15, May.
  • Handle: RePEc:hin:complx:6059673
    DOI: 10.1155/2019/6059673
    as

    Download full text from publisher

    File URL: http://downloads.hindawi.com/journals/8503/2019/6059673.pdf
    Download Restriction: no

    File URL: http://downloads.hindawi.com/journals/8503/2019/6059673.xml
    Download Restriction: no

    File URL: https://libkey.io/10.1155/2019/6059673?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alejandro Llorente & Manuel Garcia-Herranz & Manuel Cebrian & Esteban Moro, 2015. "Social Media Fingerprints of Unemployment," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-13, May.
    2. Daniel Preoţiuc-Pietro & Svitlana Volkova & Vasileios Lampos & Yoram Bachrach & Nikolaos Aletras, 2015. "Studying User Income through Language, Behaviour and Affect in Social Media," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-17, September.
    3. Shaojun Luo & Flaviano Morone & Carlos Sarraute & Matías Travizano & Hernán A. Makse, 2017. "Inferring personal economic status from social network location," Nature Communications, Nature, vol. 8(1), pages 1-7, August.
    4. H Andrew Schwartz & Johannes C Eichstaedt & Margaret L Kern & Lukasz Dziurzynski & Stephanie M Ramones & Megha Agrawal & Achal Shah & Michal Kosinski & David Stillwell & Martin E P Seligman & Lyle H U, 2013. "Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach," PLOS ONE, Public Library of Science, vol. 8(9), pages 1-16, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Trynos Gumbo & Thembani Moyo, 2020. "Exploring the Interoperability of Public Transport Systems for Sustainable Mobility in Developing Cities: Lessons from Johannesburg Metropolitan City, South Africa," Sustainability, MDPI, vol. 12(15), pages 1-16, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sandra C Matz & Jochen I Menges & David J Stillwell & H Andrew Schwartz, 2019. "Predicting individual-level income from Facebook profiles," PLOS ONE, Public Library of Science, vol. 14(3), pages 1-13, March.
    2. Yadi Zhu & Feng Chen & Ming Li & Zijia Wang, 2018. "Inferring the Economic Attributes of Urban Rail Transit Passengers Based on Individual Mobility Using Multisource Data," Sustainability, MDPI, vol. 10(11), pages 1-17, November.
    3. Jean M. Twenge & Hannah VanLandingham & W. Keith Campbell, 2017. "The Seven Words You Can Never Say on Television: Increases in the Use of Swear Words in American Books, 1950-2008," SAGE Open, , vol. 7(3), pages 21582440177, August.
    4. Pulkit Sharma & Achut Manandhar & Patrick Thomson & Jacob Katuva & Robert Hope & David A. Clifton, 2019. "Combining Multi-Modal Statistics for Welfare Prediction Using Deep Learning," Sustainability, MDPI, vol. 11(22), pages 1-15, November.
    5. Samara Ahmed & Adil E. Rajput & Akila Sarirete & Asma Aljaberi & Ohoud Alghanem & Abrar Alsheraigi, 2020. "Studying Unemployment Effects on Mental Health: Social Media versus the Traditional Approach," Sustainability, MDPI, vol. 12(19), pages 1-14, October.
    6. Gallus, Jana & Bhatia, Sudeep, 2020. "Gender, power and emotions in the collaborative production of knowledge: A large-scale analysis of Wikipedia editor conversations," Organizational Behavior and Human Decision Processes, Elsevier, vol. 160(C), pages 115-130.
    7. Liang Xu & Min Xu & Zehua Jiang & Xin Wen & Yishan Liu & Zaoyi Sun & Hongting Li & Xiuying Qian, 2023. "How have music emotions been described in Google books? Historical trends and corpus differences," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-11, December.
    8. Jordan Carpenter & Daniel Preotiuc-Pietro & Jenna Clark & Lucie Flekova & Laura Smith & Margaret L. Kern & Anneke Buffone & Lyle Ungar & Martin Seligman, 2018. "The impact of actively open-minded thinking on social media communication," Judgment and Decision Making, Society for Judgment and Decision Making, vol. 13(6), pages 562-574, November.
    9. Indaco, Agustín, 2020. "From twitter to GDP: Estimating economic activity from social media," Regional Science and Urban Economics, Elsevier, vol. 85(C).
    10. Karol Król & Dariusz Zdonek, 2021. "Most Often Motivated by Social Media: The Who, the What, and the How Much—Experience from Poland," Sustainability, MDPI, vol. 13(20), pages 1-20, October.
    11. Ola Hall & Francis Dompae & Ibrahim Wahab & Fred Mawunyo Dzanku, 2023. "A review of machine learning and satellite imagery for poverty prediction: Implications for development research and applications," Journal of International Development, John Wiley & Sons, Ltd., vol. 35(7), pages 1753-1768, October.
    12. Francis Rathinam & Sayak Khatua & Zeba Siddiqui & Manya Malik & Pallavi Duggal & Samantha Watson & Xavier Vollenweider, 2021. "Using big data for evaluating development outcomes: A systematic map," Campbell Systematic Reviews, John Wiley & Sons, vol. 17(3), September.
    13. Bidur Devkota & Hiroyuki Miyazaki & Apichon Witayangkurn & Sohee Minsun Kim, 2019. "Using Volunteered Geographic Information and Nighttime Light Remote Sensing Data to Identify Tourism Areas of Interest," Sustainability, MDPI, vol. 11(17), pages 1-29, August.
    14. Vitalis, Kyriacos & Stefanidis, Dimosthenis & Pallis, George & Dikaiakos, Marios & Nicolaou, Nicos & Nicolaides, Christos, 2024. "Quantifying the impact of online social networks on the success of entrepreneurs," OSF Preprints x6vda, Center for Open Science.
    15. Vivek Kulkarni & Margaret L Kern & David Stillwell & Michal Kosinski & Sandra Matz & Lyle Ungar & Steven Skiena & H Andrew Schwartz, 2018. "Latent human traits in the language of social media: An open-vocabulary approach," PLOS ONE, Public Library of Science, vol. 13(11), pages 1-18, November.
    16. Grazia Biorci & Antonella Emina & Michelangelo Puliga & Lisa Sella & Gianna Vivaldo, 2016. "Tweet-tales: moods of socio-economic crisis?," Working Papers 04/2016, IMT School for Advanced Studies Lucca, revised Jul 2016.
    17. Cem Çağrı Dönmez & Abdulkadir Atalan, 2019. "Developing Statistical Optimization Models for Urban Competitiveness Index: Under the Boundaries of Econophysics Approach," Complexity, Hindawi, vol. 2019, pages 1-11, November.
    18. Jong Hwan Suh, 2022. "Machine-Learning-Based Gender Distribution Prediction from Anonymous News Comments: The Case of Korean News Portal," Sustainability, MDPI, vol. 14(16), pages 1-17, August.
    19. Karel Hrazdil & Jiri Novak & Rafael Rogo & Christine Wiedman & Ray Zhang, 2020. "Measuring executive personality using machine‐learning algorithms: A new approach and audit fee‐based validation tests," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 47(3-4), pages 519-544, March.
    20. Luo, Shuli & He, Sylvia Y., 2021. "Understanding gender difference in perceptions toward transit services across space and time: A social media mining approach," Transport Policy, Elsevier, vol. 111(C), pages 63-73.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hin:complx:6059673. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mohamed Abdelhakeem (email available below). General contact details of provider: https://www.hindawi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.