Author
Listed:
- golder, su
- Stevens, Robin
- O'Connor, Karen
- James, Richard
- Gonzalez-Hernandez, Graciela
Abstract
Background: A growing amount of health research uses social media data. Those critical of social media research often cite that it may be unrepresentative of the population. Identifying the demographics of social media users enables us to measure the representativeness. Extracting race or ethnicity from social media data can be difficult and researchers may choose from a multitude of different approaches. Methods: We present a scoping review to identify the methods used to extract race or ethnicity from Twitter datasets. We searched 16 electronic databases and carried out reference checking in order to identify relevant articles. Sifting of each record was undertaken independently by at least two researchers with any disagreement discussed. The research could be grouped by the methods applied to extract race or ethnicity. Results: From 1093 records we identified 56 that met our inclusion criteria. The majority focus on Twitter users based in the US. A range of types of data were used including Twitter profile -pictures, bios, and/or location, and the content in the tweets themselves. The methods used were wide ranging and included using manual inference, linkage to census data, commercial software, language/dialect recognition and machine learning. Not all studies evaluated their methods. Those that did found accuracy to vary from 45% to 93% with significantly lower accuracy identifying non-white race categories. There may be some ethical questions over some of the methods used, particularly using photos or dialect, as well as questions surrounding accuracy. Conclusion: There is no standard approach or guidelines for extracting race or ethnicity from Twitter or other social media. Social media researchers must use careful interpretation of race or ethnicity and not over-promise what can be achieved, as even manual screening is a subjective, imperfect method. Future research should establish the accuracy of methods to inform evidence-based best practice guidelines for social media researchers, and be guided by concerns of equity and social justice.
Suggested Citation
golder, su & Stevens, Robin & O'Connor, Karen & James, Richard & Gonzalez-Hernandez, Graciela, 2021.
"Who is Tweeting? A Scoping Review of Methods to Establish Race and Ethnicity from Twitter Datasets,"
SocArXiv
wru5q, Center for Open Science.
Handle:
RePEc:osf:socarx:wru5q
DOI: 10.31219/osf.io/wru5q
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:wru5q. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.