IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i16p3519-d1217520.html
   My bibliography  Save this article

Multi-Corpus Learning for Audio–Visual Emotions and Sentiment Recognition

Author

Listed:
  • Elena Ryumina

    (St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), 199178 St. Petersburg, Russia
    These authors contributed equally to this work.)

  • Maxim Markitantov

    (St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), 199178 St. Petersburg, Russia
    These authors contributed equally to this work.)

  • Alexey Karpov

    (St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), 199178 St. Petersburg, Russia)

Abstract

Recognition of emotions and sentiment (affective states) from human audio–visual information is widely used in healthcare, education, entertainment, and other fields; therefore, it has become a highly active research area. The large variety of corpora with heterogeneous data available for the development of single-corpus approaches for recognition of affective states may lead to approaches trained on one corpus being less effective on another. In this article, we propose a multi-corpus learned audio–visual approach for emotion and sentiment recognition. It is based on the extraction of mid-level features at the segment level using two multi-corpus temporal models (a pretrained transformer with GRU layers for the audio modality and pre-trained 3D CNN with BiLSTM-Former for the video modality) and on predicting affective states using two single-corpus cross-modal gated self-attention fusion (CMGSAF) models. The proposed approach was tested on the RAMAS and CMU-MOSEI corpora. To date, our approach has outperformed state-of-the-art audio–visual approaches for emotion recognition by 18.2% (78.1% vs. 59.9%) for the CMU-MOSEI corpus in terms of the Weighted Accuracy and by 0.7% (82.8% vs. 82.1%) for the RAMAS corpus in terms of the Unweighted Average Recall.

Suggested Citation

  • Elena Ryumina & Maxim Markitantov & Alexey Karpov, 2023. "Multi-Corpus Learning for Audio–Visual Emotions and Sentiment Recognition," Mathematics, MDPI, vol. 11(16), pages 1-22, August.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:16:p:3519-:d:1217520
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/16/3519/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/16/3519/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:16:p:3519-:d:1217520. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.