OBSeg: Accurate and Fast Instance Segmentation Framework Using Segmentation Foundation Models with Oriented Bounding Box Prompts
Accurate and fast instance segmentation in remote sensing images is a long-standing challenge. Since horizontal bounding boxes (HBBs) introduce many interference objects, oriented bounding boxes (OBBs) are usually used for instance identification. However, based on ``segmentation within bounding box'' paradigm, current instance segmentation methods using OBBs are overly dependent on bounding box detection performance. Recently, box prompt-based segmentation foundation models (BSMs), e.g., Segment Anything Model, have been rapidly developed and can alleviate this dependence. However, existing BSMs are based on HBB prompts, which cannot fully leverage the capabilities of BSMs. For objects with multiple scales, dense arrangement and arbitrary orientations, HBB prompts introduce many interference areas. The current methods using BSMs with HBB prompts, such as RSPrompter, cannot meet the high-precision segmentation requirements. In this paper, we propose OBSeg, an accurate and fast instance segmentation framework using BSMs with OBB prompts. Specifically, OBSeg first detects OBBs to distinguish instances and provide coarse localization information. Then, it predicts OBB prompt-related masks for fine segmentation. In addition, to enable BSMs to handle OBB prompts, we propose a novel OBB prompt encoder. Since OBBs only serve as prompts, OBSeg alleviates the over-dependence on bounding box detection performance. Thanks to more accurate OBB prompts, OBSeg outperforms other instance segmentation methods using BSMs with HBB prompts. On the other hand, remote sensing equipment such as drones has a more urgent need for lightweight models. To make BSMs with OBB prompts more lightweight, a Gaussian smoothing-based knowledge distillation method with multi-type target supervision is further introduced. Experiments demonstrate that OBSeg significantly outperforms current instance segmentation methods on multiple datasets in terms of instance segmentation accuracy and has competitive inference speed.
For instance segmentation in remote sensing images, (a): HBB introduces many interference objects. (b): The ``segmentation within bounding box'' paradigm limits the segmentation to be performed mainly within the detected OBB, making the segmentation performance overly dependent on the OBB detection performance. Once the OBB detection is inaccurate, the mask segmentation will also be affected. (c) The proposed OBSeg only uses OBB as a prompt to guide object segmentation, so the segmentation result is less dependent on OBB detection performance. Although the OBB detection is inaccurate, the mask can be segmented accurately.
- OBSeg
Architecture of the proposed OBSeg. It is mainly composed of four parts: an OBB detection module, an image encoder, an OBB prompt encoder, and a mask decoder. OBSeg first detects OBBs to distinguish instances, identify classes, and provide coarse localization information. Then, the mask decoder utilizes the image embeddings generated by the image encoder and the OBB prompt embeddings generated by the OBB prompt encoder to generate segmentation masks. In addition, Gaussian smoothing-based knowledge distillation with multi-type target supervision is performed on the OBB prompt encoder and the mask decoder to make OBSeg more lightweight.
- OBB Prompt Encoder
Architecture of the proposed OBB prompt encoder. The input is an OBB (
- Knowledge Distillation on the OBB Prompt Encoder and Mask Decoder
The process of knowledge distillation for the OBB prompt encoder and mask decoder. ``TE``, ``BE`` and ``OE`` represent encoded feature embeddings with respect to the top-left point, bottom-right point and orientation of an OBB, respectively. ``GS`` stands for Gaussian smoothing.
pip install lightning
pip install pytorch
pip install opencv-python pycocotools matplotlib onnxruntime onnx
pip install -U openmim
mim install mmcv-full
mim install mmdet\<3.0.0
pip install mmrotate
# Train OBB detection module (e.g., Oriented R-CNN with ResNet-18 as the backbone)
python OBB_Detection_Module/tools/train.py
# Train OBB prompt-based segmentation module (``OSM'' for short, we use it to train the teacher model)
python OBB_Prompt_based_Segmentation_Module/OSM/train.py
# Train OBB prompt-based segmentation module with knowledge distillation (``OSM_KD'' for short, we use it to train the student model)
python OBB_Prompt_based_Segmentation_Module/OSM_KD/train.py
# Test oriented bounding box detection module (e.g., Oriented R-CNN with ResNet-18 as the backbone)
python OBB_Detection_Module/tools/test.py
# Test OBB prompt-based segmentation module (``OSM'' for short, we use it to test the teacher model)
python OBB_Prompt_based_Segmentation_Module/OSM/inference.py
# Test OBB prompt-based segmentation module with knowledge distillation (``OSM_KD'' for short, we use it to test the student model)
python OBB_Prompt_based_Segmentation_Module/OSM_KD/inference.py