IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0115545.html
   My bibliography  Save this article

Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data

Author

Listed:
  • Luke Sloan
  • Jeffrey Morgan
  • Pete Burnap
  • Matthew Williams

Abstract

This paper specifies, designs and critically evaluates two tools for the automated identification of demographic data (age, occupation and social class) from the profile descriptions of Twitter users in the United Kingdom (UK). Meta-data data routinely collected through the Collaborative Social Media Observatory (COSMOS: http://www.cosmosproject.net/) relating to UK Twitter users is matched with the occupational lookup tables between job and social class provided by the Office for National Statistics (ONS) using SOC2010. Using expert human validation, the validity and reliability of the automated matching process is critically assessed and a prospective class distribution of UK Twitter users is offered with 2011 Census baseline comparisons. The pattern matching rules for identifying age are explained and enacted following a discussion on how to minimise false positives. The age distribution of Twitter users, as identified using the tool, is presented alongside the age distribution of the UK population from the 2011 Census. The automated occupation detection tool reliably identifies certain occupational groups, such as professionals, for which job titles cannot be confused with hobbies or are used in common parlance within alternative contexts. An alternative explanation on the prevalence of hobbies is that the creative sector is overrepresented on Twitter compared to 2011 Census data. The age detection tool illustrates the youthfulness of Twitter users compared to the general UK population as of the 2011 Census according to proportions, but projections demonstrate that there is still potentially a large number of older platform users. It is possible to detect “signatures” of both occupation and age from Twitter meta-data with varying degrees of accuracy (particularly dependent on occupational groups) but further confirmatory work is needed.

Suggested Citation

  • Luke Sloan & Jeffrey Morgan & Pete Burnap & Matthew Williams, 2015. "Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-Data," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-20, March.
  • Handle: RePEc:plo:pone00:0115545
    DOI: 10.1371/journal.pone.0115545
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115545
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0115545&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0115545?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Márton Mestyán & Taha Yasseri & János Kertész, 2013. "Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data," PLOS ONE, Public Library of Science, vol. 8(8), pages 1-8, August.
    2. Jeremy Ginsberg & Matthew H. Mohebbi & Rajan S. Patel & Lynnette Brammer & Mark S. Smolinski & Larry Brilliant, 2009. "Detecting influenza epidemics using search engine query data," Nature, Nature, vol. 457(7232), pages 1012-1014, February.
    3. Panagiotis Papaioannnou & Lucia Russo & George Papaioannou & Constantinos Siettos, 2013. "Can social microblogging be used to forecast intraday exchange rates?," Papers 1310.5306, arXiv.org.
    4. Luke Sloan & Jeffrey Morgan & William Housley & Matthew Williams & Adam Edwards & Pete Burnap & Omer Rana, 2013. "Knowing the Tweeters: Deriving Sociologically Relevant Demographics from Twitter," Sociological Research Online, , vol. 18(3), pages 74-84, August.
    5. H Andrew Schwartz & Johannes C Eichstaedt & Margaret L Kern & Lukasz Dziurzynski & Stephanie M Ramones & Megha Agrawal & Achal Shah & Michal Kosinski & David Stillwell & Martin E P Seligman & Lyle H U, 2013. "Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach," PLOS ONE, Public Library of Science, vol. 8(9), pages 1-16, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Marco Schmitt & Robert Jäschke, 2017. "What do computer scientists tweet? Analyzing the link-sharing practice on Twitter," PLOS ONE, Public Library of Science, vol. 12(6), pages 1-28, June.
    2. Fahrettin Kayan & Yasemin Bilişli & Mehmet Kayakuş & Fatma Yiğit Açıkgöz & Agah Başdeğirmen & Meltem Güler, 2025. "Analysing Sustainability and Green Energy with Artificial Intelligence: A Turkish English Social Media Perspective," Sustainability, MDPI, vol. 17(5), pages 1-23, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daniele Barchiesi & Helen Susannah Moat & Christian Alis & Steven Bishop & Tobias Preis, 2015. "Quantifying International Travel Flows Using Flickr," PLOS ONE, Public Library of Science, vol. 10(7), pages 1-8, July.
    2. Shota Saito & Yoshito Hirata & Kazutoshi Sasahara & Hideyuki Suzuki, 2015. "Tracking Time Evolution of Collective Attention Clusters in Twitter: Time Evolving Nonnegative Matrix Factorisation," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-17, September.
    3. Philip ME Garboden, 2019. "Sources and Types of Big Data for Macroeconomic Forecasting," Working Papers 2019-3, University of Hawaii Economic Research Organization, University of Hawaii at Manoa.
    4. Salvatore Giorgi & David B. Yaden & Johannes C. Eichstaedt & Robert D. Ashford & Anneke E.K. Buffone & H. Andrew Schwartz & Lyle H. Ungar & Brenda Curtis, 2020. "Cultural Differences in Tweeting about Drinking Across the US," IJERPH, MDPI, vol. 17(4), pages 1-14, February.
    5. H. Andrew Schwartz & Lyle H. Ungar, 2015. "Data-Driven Content Analysis of Social Media," The ANNALS of the American Academy of Political and Social Science, , vol. 659(1), pages 78-94, May.
    6. Brenda Curtis & Salvatore Giorgi & Anneke E K Buffone & Lyle H Ungar & Robert D Ashford & Jessie Hemmons & Dan Summers & Casey Hamilton & H Andrew Schwartz, 2018. "Can Twitter be used to predict county excessive alcohol consumption rates?," PLOS ONE, Public Library of Science, vol. 13(4), pages 1-16, April.
    7. David H Chae & Sean Clouston & Mark L Hatzenbuehler & Michael R Kramer & Hannah L F Cooper & Sacoby M Wilson & Seth I Stephens-Davidowitz & Robert S Gold & Bruce G Link, 2015. "Association between an Internet-Based Measure of Area Racism and Black Mortality," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-12, April.
    8. Jean M. Twenge & Hannah VanLandingham & W. Keith Campbell, 2017. "The Seven Words You Can Never Say on Television: Increases in the Use of Swear Words in American Books, 1950-2008," SAGE Open, , vol. 7(3), pages 21582440177, August.
    9. Xiaoli Wang & Shuangsheng Wu & C Raina MacIntyre & Hongbin Zhang & Weixian Shi & Xiaomin Peng & Wei Duan & Peng Yang & Yi Zhang & Quanyi Wang, 2015. "Using an Adjusted Serfling Regression Model to Improve the Early Warning at the Arrival of Peak Timing of Influenza in Beijing," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-14, March.
    10. Ishani Chaudhuri & Parthajit Kayal, 2022. "Predicting Power of Ticker Search Volume in Indian Stock Market," Working Papers 2022-214, Madras School of Economics,Chennai,India.
    11. Yang, Xin & Pan, Bing & Evans, James A. & Lv, Benfu, 2015. "Forecasting Chinese tourist volume with search engine data," Tourism Management, Elsevier, vol. 46(C), pages 386-397.
    12. Kuchler, Theresa & Russel, Dominic & Stroebel, Johannes, 2022. "JUE Insight: The geographic spread of COVID-19 correlates with the structure of social networks as measured by Facebook," Journal of Urban Economics, Elsevier, vol. 127(C).
    13. Markowitz, Sara & Nesson, Erik & Robinson, Joshua J., 2019. "The effects of employment on influenza rates," Economics & Human Biology, Elsevier, vol. 34(C), pages 286-295.
    14. Bentzen, Jeanet Sinding, 2021. "In crisis, we pray: Religiosity and the COVID-19 pandemic," Journal of Economic Behavior & Organization, Elsevier, vol. 192(C), pages 541-583.
    15. Jesse T. Richman & Ryan J. Roberts, 2023. "Assessing Spurious Correlations in Big Search Data," Forecasting, MDPI, vol. 5(1), pages 1-12, February.
    16. Linus Schiöler & Marianne Fris�n, 2012. "Multivariate outbreak detection," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(2), pages 223-242, April.
    17. Sasikiran Kandula & Jeffrey Shaman, 2019. "Reappraising the utility of Google Flu Trends," PLOS Computational Biology, Public Library of Science, vol. 15(8), pages 1-16, August.
    18. Justin R Ortiz & Hong Zhou & David K Shay & Kathleen M Neuzil & Ashley L Fowlkes & Christopher H Goss, 2011. "Monitoring Influenza Activity in the United States: A Comparison of Traditional Surveillance Systems with Google Flu Trends," PLOS ONE, Public Library of Science, vol. 6(4), pages 1-9, April.
    19. Daniel E. O'Leary, 2024. "Toward an extended framework of exhaust data for predictive analytics: An empirical approach," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 31(2), June.
    20. Hyekyung Woo & Youngtae Cho & Eunyoung Shim & Kihwang Lee & Gilyoung Song, 2015. "Public Trauma after the Sewol Ferry Disaster: The Role of Social Media in Understanding the Public Mood," IJERPH, MDPI, vol. 12(9), pages 1-10, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0115545. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.