IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i18p3244-d908669.html
   My bibliography  Save this article

Path-Wise Attention Memory Network for Visual Question Answering

Author

Listed:
  • Yingxin Xiang

    (School of Computer Science and Engineering, Central South University, Changsha 410083, China)

  • Chengyuan Zhang

    (College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China)

  • Zhichao Han

    (College of Science and Technology, Xiangsihu College Guangxi University for Nationalities, Nanning 530008, China)

  • Hao Yu

    (College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China)

  • Jiaye Li

    (College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China)

  • Lei Zhu

    (College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China)

Abstract

Visual question answering (VQA) is regarded as a multi-modal fine-grained feature fusion task, which requires the construction of multi-level and omnidirectional relations between nodes. One main solution is the composite attention model which is composed of co-attention (CA) and self-attention(SA). However, the existing composite models only consider the stack of single attention blocks, lack of path-wise historical memory, and overall adjustments. We propose a path attention memory network (PAM) to construct a more robust composite attention model. After each single-hop attention block (SA or CA), the importance of the cumulative nodes is used to calibrate the signal strength of nodes’ features. Four memoried single-hop attention matrices are used to obtain the path-wise co-attention matrix of path-wise attention (PA); therefore, the PA block is capable of synthesizing and strengthening the learning effect on the whole path. Moreover, we use guard gates of the target modal to check the source modal values in CA and conditioning gates of another modal to guide the query and key of the current modal in SA. The proposed PAM is beneficial to construct a robust multi-hop neighborhood relationship between visual and language and achieves excellent performance on both VQA2.0 and VQA-CP V2 datasets.

Suggested Citation

  • Yingxin Xiang & Chengyuan Zhang & Zhichao Han & Hao Yu & Jiaye Li & Lei Zhu, 2022. "Path-Wise Attention Memory Network for Visual Question Answering," Mathematics, MDPI, vol. 10(18), pages 1-19, September.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:18:p:3244-:d:908669
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/18/3244/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/18/3244/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:18:p:3244-:d:908669. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.