IDEAS home Printed from https://ideas.repec.org/a/sae/somere/v53y2024i3p1534-1587.html
   My bibliography  Save this article

Image Clustering: An Unsupervised Approach to Categorize Visual Data in Social Science Research

Author

Listed:
  • Han Zhang
  • Yilang Peng

Abstract

Automated image analysis has received increasing attention in social scientific research, yet existing scholarship has mostly covered the application of supervised learning to classify images into predefined categories. This study focuses on the task of unsupervised image clustering, which aims to automatically discover categories from unlabelled image data. We first review the steps to perform image clustering and then focus on one key challenge in this task—finding intermediate representations of images. We present several methods of extracting intermediate image representations, including the bag-of-visual-words model, self-supervised learning, and transfer learning (in particular, feature extraction with pretrained models). We compare these methods using various visual datasets, including images related to protests in China from Weibo, images about climate change on Instagram, and profile images of the Russian Internet Research Agency on Twitter. In addition, we propose a systematic way to interpret and validate clustering solutions. Results show that transfer learning significantly outperforms the other methods. The dataset used in the pretrained model critically determines what categories the algorithms can discover.

Suggested Citation

  • Han Zhang & Yilang Peng, 2024. "Image Clustering: An Unsupervised Approach to Categorize Visual Data in Social Science Research," Sociological Methods & Research, , vol. 53(3), pages 1534-1587, August.
  • Handle: RePEc:sae:somere:v:53:y:2024:i:3:p:1534-1587
    DOI: 10.1177/00491241221082603
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/00491241221082603
    Download Restriction: no

    File URL: https://libkey.io/10.1177/00491241221082603?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:somere:v:53:y:2024:i:3:p:1534-1587. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.