Skip to content

Oriented Bounding Box (OBB) -based Instance Segmentation

Notifications You must be signed in to change notification settings

zhen6618/OBBInstanceSegmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 

Repository files navigation

OBSeg: Accurate and Fast Instance Segmentation Framework Using Segmentation Foundation Models with Oriented Bounding Box Prompts

Accurate and fast instance segmentation in remote sensing images is a long-standing challenge. Since horizontal bounding boxes (HBBs) introduce many interference objects, oriented bounding boxes (OBBs) are usually used for instance identification. However, based on ``segmentation within bounding box'' paradigm, current instance segmentation methods using OBBs are overly dependent on bounding box detection performance. Recently, box prompt-based segmentation foundation models (BSMs), e.g., Segment Anything Model, have been rapidly developed and can alleviate this dependence. However, existing BSMs are based on HBB prompts, which cannot fully leverage the capabilities of BSMs. For objects with multiple scales, dense arrangement and arbitrary orientations, HBB prompts introduce many interference areas. The current methods using BSMs with HBB prompts, such as RSPrompter, cannot meet the high-precision segmentation requirements. In this paper, we propose OBSeg, an accurate and fast instance segmentation framework using BSMs with OBB prompts. Specifically, OBSeg first detects OBBs to distinguish instances and provide coarse localization information. Then, it predicts OBB prompt-related masks for fine segmentation. In addition, to enable BSMs to handle OBB prompts, we propose a novel OBB prompt encoder. Since OBBs only serve as prompts, OBSeg alleviates the over-dependence on bounding box detection performance. Thanks to more accurate OBB prompts, OBSeg outperforms other instance segmentation methods using BSMs with HBB prompts. On the other hand, remote sensing equipment such as drones has a more urgent need for lightweight models. To make BSMs with OBB prompts more lightweight, a Gaussian smoothing-based knowledge distillation method with multi-type target supervision is further introduced. Experiments demonstrate that OBSeg significantly outperforms current instance segmentation methods on multiple datasets in terms of instance segmentation accuracy and has competitive inference speed.

Task

For instance segmentation in remote sensing images, (a): HBB introduces many interference objects. (b): The ``segmentation within bounding box'' paradigm limits the segmentation to be performed mainly within the detected OBB, making the segmentation performance overly dependent on the OBB detection performance. Once the OBB detection is inaccurate, the mask segmentation will also be affected. (c) The proposed OBSeg only uses OBB as a prompt to guide object segmentation, so the segmentation result is less dependent on OBB detection performance. Although the OBB detection is inaccurate, the mask can be segmented accurately.

Method

  1. OBSeg

Architecture of the proposed OBSeg. It is mainly composed of four parts: an OBB detection module, an image encoder, an OBB prompt encoder, and a mask decoder. OBSeg first detects OBBs to distinguish instances, identify classes, and provide coarse localization information. Then, the mask decoder utilizes the image embeddings generated by the image encoder and the OBB prompt embeddings generated by the OBB prompt encoder to generate segmentation masks. In addition, Gaussian smoothing-based knowledge distillation with multi-type target supervision is performed on the OBB prompt encoder and the mask decoder to make OBSeg more lightweight.

  1. OBB Prompt Encoder

Architecture of the proposed OBB prompt encoder. The input is an OBB ($x, y, w, h, \theta$), where $(x, y)$, $w$, $h$ and $\theta$ represent the center point, width, height and orientation, respectively.

  1. Knowledge Distillation on the OBB Prompt Encoder and Mask Decoder

The process of knowledge distillation for the OBB prompt encoder and mask decoder. ``TE``, ``BE`` and ``OE`` represent encoded feature embeddings with respect to the top-left point, bottom-right point and orientation of an OBB, respectively. ``GS`` stands for Gaussian smoothing.

Experiments

Installation

pip install lightning
pip install pytorch
pip install opencv-python pycocotools matplotlib onnxruntime onnx
pip install -U openmim
mim install mmcv-full
mim install mmdet\<3.0.0
pip install mmrotate

Prepare Your Dataset

Training

# Train OBB detection module (e.g., Oriented R-CNN with ResNet-18 as the backbone)
python OBB_Detection_Module/tools/train.py

# Train OBB prompt-based segmentation module (``OSM'' for short, we use it to train the teacher model)
python OBB_Prompt_based_Segmentation_Module/OSM/train.py

# Train OBB prompt-based segmentation module with knowledge distillation (``OSM_KD'' for short, we use it to train the student model)
python OBB_Prompt_based_Segmentation_Module/OSM_KD/train.py

Inference

# Test oriented bounding box detection module (e.g., Oriented R-CNN with ResNet-18 as the backbone)
python OBB_Detection_Module/tools/test.py

# Test OBB prompt-based segmentation module (``OSM'' for short, we use it to test the teacher model)
python OBB_Prompt_based_Segmentation_Module/OSM/inference.py

# Test OBB prompt-based segmentation module with knowledge distillation (``OSM_KD'' for short, we use it to test the student model)
python OBB_Prompt_based_Segmentation_Module/OSM_KD/inference.py

Citation

Acknowledgement

lightning-sam

mmrotate

segment-anything

About

Oriented Bounding Box (OBB) -based Instance Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published