IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i17p3751-d1230049.html
   My bibliography  Save this article

A Simple Framework for Scene Graph Reasoning with Semantic Understanding of Complex Sentence Structure

Author

Listed:
  • Yoonseok Heo

    (Department of Computer Science and Engineering, Sogang University, Seoul 04107, Republic of Korea)

  • Sangwoo Kang

    (School of Computing, Gachon University, Seongnam 13120, Republic of Korea)

Abstract

A rapidly expanding multimedia environment in recent years has led to an explosive increase in demand for multimodality that can communicate with humans in various ways. Even though the convergence of vision and language intelligence has shed light on the remarkable success over the last few years, there is still a caveat: it is unknown whether they truly understand the semantics of the image. More specifically, how they correctly capture relationships between objects represented within the image is still regarded as a black box. In order to testify whether such relationships are well understood, this work mainly focuses on the Graph-structured visual Question Answering (GQA) task which evaluates the understanding of an image by reasoning a scene graph describing the structural characteristics of an image in the form of natural language together with the image. Unlike the existing approaches that have been accompanied by an additional encoder for scene graphs, we propose a simple yet effective framework using pre-trained multimodal transformers for scene graph reasoning. Inspired by the fact that a scene graph can be regarded as a set of sentences describing two related objects with a relationship, we fuse them into the framework separately from the question. In addition, we propose a multi-task learning method that utilizes evaluating the grammatical validity of questions as an auxiliary task to better understand a question with complex structures. This utilizes the semantic role labels of the question to randomly shuffle the sentence structure of the question. We have conducted extensive experiments to evaluate the effectiveness in terms of task capabilities, ablation studies, and generalization.

Suggested Citation

  • Yoonseok Heo & Sangwoo Kang, 2023. "A Simple Framework for Scene Graph Reasoning with Semantic Understanding of Complex Sentence Structure," Mathematics, MDPI, vol. 11(17), pages 1-15, August.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:17:p:3751-:d:1230049
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/17/3751/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/17/3751/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Andrej Zgank, 2022. "Influence of Highly Inflected Word Forms and Acoustic Background on the Robustness of Automatic Speech Recognition for Human–Computer Interaction," Mathematics, MDPI, vol. 10(5), pages 1-16, February.
    2. Yuri Matveev & Anton Matveev & Olga Frolova & Elena Lyakso & Nersisson Ruban, 2022. "Automatic Speech Emotion Recognition of Younger School Age Children," Mathematics, MDPI, vol. 10(14), pages 1-19, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jae Hyun Yoon & Jong Won Jung & Seok Bong Yoo, 2024. "Auxcoformer: Auxiliary and Contrastive Transformer for Robust Crack Detection in Adverse Weather Conditions," Mathematics, MDPI, vol. 12(5), pages 1-20, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jagjeet Singh & Lakshmi Babu Saheer & Oliver Faust, 2023. "Speech Emotion Recognition Using Attention Model," IJERPH, MDPI, vol. 20(6), pages 1-21, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:17:p:3751-:d:1230049. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.