CDMFusion: RGB-T Image Fusion Based on Conditional Diffusion Models via Few Denoising Steps in Open Environments
Multi-modal fusion can improve perceptual robustness and accuracy by fully utilizing multi-source sensor data. Current RGB-T fusion methods still falter with adverse illumination and weather. Recent advances in generative methods have shown the ability to enhance and restore visible images under adverse conditions. However, the fusion of RGB-T based on generative methods has not been studied in depth. Motivated by this observation, we propose CDMFusion, a three-branch conditional diffusion model that achieves fusion with dynamically enhancing multi-modal features and suppressing high-frequency interference. Specifically, we achieve feature-preserving fusion through three branches and establish a dynamic gating prediction module to adjust the enhancement of multi-modal features adaptively. In addition, considering the high time cost of existing diffusion models for generating fused images, we propose a skip patrol mechanism to achieve accelerated high-quality generation with no need for additional training. Experiments demonstrate our method achieves excellent performance in multiple datasets.
The code and datasets will be released after being accepted.