IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i9p2115-d1136618.html
   My bibliography  Save this article

SlowFast Multimodality Compensation Fusion Swin Transformer Networks for RGB-D Action Recognition

Author

Listed:
  • Xiongjiang Xiao

    (School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China)

  • Ziliang Ren

    (School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China)

  • Huan Li

    (School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China)

  • Wenhong Wei

    (School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523820, China)

  • Zhiyong Yang

    (School of Artificial Intelligence, Yantai Institute of Technology, Yantai 264003, China)

  • Huaide Yang

    (School of Electronic Information, Dongguan Polytechnic, Dongguan 523109, China)

Abstract

RGB-D-based technology combines the advantages of RGB and depth sequences which can effectively recognize human actions in different environments. However, the spatio-temporal information between different modalities is difficult to effectively learn from each other. To enhance the information exchange between different modalities, we introduce a SlowFast multimodality compensation block (SFMCB) which is designed to extract compensation features. Concretely, the SFMCB fuses features from two independent pathways with different frame rates into a single convolutional neural network to achieve performance gains for the model. Furthermore, we explore two fusion schemes to combine the feature from two independent pathways with different frame rates. To facilitate the learning of features from independent multiple pathways, multiple loss functions are utilized for joint optimization. To evaluate the effectiveness of our proposed architecture, we conducted experiments on four challenging datasets: NTU RGB+D 60, NTU RGB+D 120, THU-READ, and PKU-MMD. Experimental results demonstrate the effectiveness of our proposed model, which utilizes the SFMCB mechanism to capture complementary features of multimodal inputs.

Suggested Citation

  • Xiongjiang Xiao & Ziliang Ren & Huan Li & Wenhong Wei & Zhiyong Yang & Huaide Yang, 2023. "SlowFast Multimodality Compensation Fusion Swin Transformer Networks for RGB-D Action Recognition," Mathematics, MDPI, vol. 11(9), pages 1-19, April.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:9:p:2115-:d:1136618
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/9/2115/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/9/2115/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:9:p:2115-:d:1136618. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.