IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i22p4573-d1275830.html
   My bibliography  Save this article

A Neural Network Architecture for Children’s Audio–Visual Emotion Recognition

Author

Listed:
  • Anton Matveev

    (Child Speech Research Group, Department of Higher Nervous Activity and Psychophysiology, St. Petersburg University, St. Petersburg 199034, Russia)

  • Yuri Matveev

    (Child Speech Research Group, Department of Higher Nervous Activity and Psychophysiology, St. Petersburg University, St. Petersburg 199034, Russia)

  • Olga Frolova

    (Child Speech Research Group, Department of Higher Nervous Activity and Psychophysiology, St. Petersburg University, St. Petersburg 199034, Russia)

  • Aleksandr Nikolaev

    (Child Speech Research Group, Department of Higher Nervous Activity and Psychophysiology, St. Petersburg University, St. Petersburg 199034, Russia)

  • Elena Lyakso

    (Child Speech Research Group, Department of Higher Nervous Activity and Psychophysiology, St. Petersburg University, St. Petersburg 199034, Russia)

Abstract

Detecting and understanding emotions are critical for our daily activities. As emotion recognition (ER) systems develop, we start looking at more difficult cases than just acted adult audio–visual speech. In this work, we investigate the automatic classification of the audio–visual emotional speech of children, which presents several challenges including the lack of publicly available annotated datasets and the low performance of the state-of-the art audio–visual ER systems. In this paper, we present a new corpus of children’s audio–visual emotional speech that we collected. Then, we propose a neural network solution that improves the utilization of the temporal relationships between audio and video modalities in the cross-modal fusion for children’s audio–visual emotion recognition. We select a state-of-the-art neural network architecture as a baseline and present several modifications focused on a deeper learning of the cross-modal temporal relationships using attention. By conducting experiments with our proposed approach and the selected baseline model, we observe a relative improvement in performance by 2%. Finally, we conclude that focusing more on the cross-modal temporal relationships may be beneficial for building ER systems for child–machine communications and environments where qualified professionals work with children.

Suggested Citation

  • Anton Matveev & Yuri Matveev & Olga Frolova & Aleksandr Nikolaev & Elena Lyakso, 2023. "A Neural Network Architecture for Children’s Audio–Visual Emotion Recognition," Mathematics, MDPI, vol. 11(22), pages 1-17, November.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:22:p:4573-:d:1275830
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/22/4573/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/22/4573/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Hemanta Kumar Palo & Mihir Narayan Mohanty & Mahesh Chandra, 2018. "Speech Emotion Analysis of Different Age Groups Using Clustering Techniques," International Journal of Information Retrieval Research (IJIRR), IGI Global, vol. 8(1), pages 69-85, January.
    2. Mathilde Marie Duville & Luz María Alonso-Valerdi & David I. Ibarra-Zarate, 2021. "Mexican Emotional Speech Database Based on Semantic, Frequency, Familiarity, Concreteness, and Cultural Shaping of Affective Prosody," Data, MDPI, vol. 6(12), pages 1-34, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:22:p:4573-:d:1275830. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.