IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v12y2024i19p3061-d1489184.html
   My bibliography  Save this article

A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation

Author

Listed:
  • Haoyuan Chen

    (College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
    School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523808, China)

  • Sihang Zhou

    (College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China)

  • Kuan Li

    (School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523808, China)

  • Jianping Yin

    (School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523808, China)

  • Jian Huang

    (College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China)

Abstract

In the realm of human–robot interaction, the integration of visual and verbal cues has become increasingly significant. This paper focuses on the challenges and advancements in referring image segmentation (RIS), a task that involves segmenting images based on textual descriptions. Traditional approaches to RIS have primarily focused on pixel-level classification. These methods, although effective, often overlook the interconnectedness of pixels, which can be crucial for interpreting complex visual scenes. Furthermore, while the PolyFormer model has shown impressive performance in RIS, its large number of parameters and high training data requirements pose significant challenges. These factors restrict its adaptability and optimization on standard consumer hardware, hindering further enhancements in subsequent research. Addressing these issues, our study introduces a novel two-branch decoder framework with SAM (segment anything model) for RIS. This framework incorporates an MLP decoder and a KAN decoder with a multi-scale feature fusion module, enhancing the model’s capacity to discern fine details within images. The framework’s robustness is further bolstered by an ensemble learning strategy that consolidates the insights from both the MLP and KAN decoder branches. More importantly, we collect the segmentation target edge coordinates and bounding box coordinates as input cues for the SAM model. This strategy leverages SAM’s zero-sample learning capabilities to refine and optimize the segmentation outcomes. Our experimental findings, based on the widely recognized RefCOCO, RefCOCO+, and RefCOCOg datasets, confirm the effectiveness of this method. The results not only achieve state-of-the-art (SOTA) performance in segmentation but are also supported by ablation studies that highlight the contributions of each component to the overall improvement in performance.

Suggested Citation

  • Haoyuan Chen & Sihang Zhou & Kuan Li & Jianping Yin & Jian Huang, 2024. "A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation," Mathematics, MDPI, vol. 12(19), pages 1-21, September.
  • Handle: RePEc:gam:jmathe:v:12:y:2024:i:19:p:3061-:d:1489184
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/12/19/3061/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/12/19/3061/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:12:y:2024:i:19:p:3061-:d:1489184. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.