IDEAS home Printed from https://ideas.repec.org/p/ces/ceswps/_10578.html
   My bibliography  Save this paper

Gender Stereotypes in User-Generated Content

Author

Listed:
  • Anna Kerkhof
  • Valentin Reich

Abstract

Gender stereotypes pose an important hurdle on the way to gender equality. It is difficult to quantify the problem, though, as stereotypical beliefs are often subconscious or not openly expressed. User-generated content (UGC) opens up novel opportunities to overcome such challenges, as the anonymity of users may eliminate social pressures. This paper leverages over a million anonymous comments from a major German online discussion forum to study the prevalence and development of gender stereotypes over almost a decade. To that end, we develop an innovative and widely applicable text analysis procedure that overcomes conceptual challenges that arise whenever two variables in the training data are correlated, and changes in that correlation in the prediction sample are subject of examination themselves. Here, we apply the procedure to study the correlation between gender (i.e., does a comment discuss women or men) and gender stereotypical topics (e.g., work or family) in our comments, where we interpret a strong correlation as the presence of gender stereotypes. We find that men are indeed discussed relatively more often in the context of stereotypical male topics such as work and money, and that women are discussed relatively more often in the context of stereotypical female topics such as family, home, and physical appearance. While the prevalence of gender stereotypes related to stereotypical male topics diminishes over time, gender stereotypes related to female topics mostly persist.

Suggested Citation

  • Anna Kerkhof & Valentin Reich, 2023. "Gender Stereotypes in User-Generated Content," CESifo Working Paper Series 10578, CESifo.
  • Handle: RePEc:ces:ceswps:_10578
    as

    Download full text from publisher

    File URL: https://www.cesifo.org/DocDL/cesifo1_wp10578.pdf
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    gender bias; gender stereotypes; natural language processing; machine learning; user-generated content; word embeddings;
    All these keywords.

    JEL classification:

    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • J16 - Labor and Demographic Economics - - Demographic Economics - - - Economics of Gender; Non-labor Discrimination

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ces:ceswps:_10578. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Klaus Wohlrabe (email available below). General contact details of provider: https://edirc.repec.org/data/cesifde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.