Jen-Yuan Huang1 2, Haofan Wang2, Qixun Wang2, Xu Bai2, Hao Ai2, Peng Xing2, Jen-Tse Huang3
1Peking University · 2InstantX Team · 3The Chinese University of Hong Kong
InstantIR is a novel single-image restoration model designed to resurrect your damaged images, delivering extrem-quality yet realistic details. You can further boost InstantIR performance with additional text prompts, even achieve customized editing!
- 11/07/2024 🔥 Our Space Demo is online, thanks HuggingFace🤗! Play with InstantIR and leave your feedbacks!
- 11/03/2024 🔥 We provide a Gradio launching script for InstantIR, you can now deploy it on your local machine!
- 11/02/2024 🔥 InstantIR is now compatitble with
diffusers
🧨, you can utilize features from this fascinating package! - 10/15/2024 🔥 Code and model released!
- Tutorial video
- Launch online demo
- Remove dependency on local
diffusers
- Gradio launching script
git clone https://github.com/instantX-research/InstantIR.git
cd InstantIR
conda create -n instantir python=3.9 -y
conda activate instantir
pip install -r requirements.txt
InstantIR is built on SDXL and DINOv2. You can download them either directly from 🤗 huggingface or using Python package.
🤗 link | Python command |
---|---|
SDXL | hf_hub_download(repo_id="stabilityai/stable-diffusion-xl-base-1.0") |
facebook/dinov2-large | hf_hub_download(repo_id="facebook/dinov2-large") |
InstantX/InstantIR | hf_hub_download(repo_id="InstantX/InstantIR") |
Note: Make sure to import the package first with from huggingface_hub import hf_hub_download
if you are using Python script.
You can run InstantIR inference using infer.sh
with the following arguments specified.
infer.sh \
--sdxl_path <path_to_SDXL> \
--vision_encoder_path <path_to_DINOv2> \
--instantir_path <path_to_InstantIR> \
--test_path <path_to_input> \
--out_path <path_to_output>
See infer.py
for more config options.
InstantIR is powerful, but with your help it can do better. InstantIR's flexible pipeline makes it tunable to a large extent. Here are some tips we found particularly useful for various cases you may encounter:
- Over-smoothing: reduce
--cfg
to 3.0~5.0. Higher CFG scales can sometimes rigid lines or lack of details. - Low fidelity: set
--preview_start
to 0.1~0.4 to preserve fidelity from inputs. The previewer can yield misleading references when input latent is too noisy. In such cases, we suggest to disable the previewer at early timesteps. - Local distortions: set
--creative_start
to 0.6~0.8. This will let InstantIR render freely in the late diffusion process, where the high-frequency details are generated. Smaller--creative_start
spares more spaces for creative restoration, but will diminish fidelity. - Faster inference: higher
--preview_start
and lower--creative_start
can both reduce computational costs and accelerate InstantIR inference.
Caution
These features are training-free and thus experimental. If you would like to try, we suggest to tune these parameters case-by-case.
InstantIR is fully compatible with diffusers
and is supported by all those powerful features in this package. You can directly load InstantIR via diffusers
snippet:
# !pip install diffusers opencv-python transformers accelerate
import torch
from PIL import Image
from diffusers import DDPMScheduler
from schedulers.lcm_single_step_scheduler import LCMSingleStepScheduler
from module.ip_adapter.utils import load_adapter_to_pipe
from pipelines.sdxl_instantir import InstantIRPipeline
# suppose you have InstantIR weights under ./models
instantir_path = f'./models'
# load pretrained models
pipe = InstantIRPipeline.from_pretrained('stabilityai/stable-diffusion-xl-base-1.0', torch_dtype=torch.float16)
# load adapter
load_adapter_to_pipe(
pipe,
f"{instantir_path}/adapter.pt",
image_encoder_or_path = 'facebook/dinov2-large',
)
# load previewer lora
pipe.prepare_previewers(instantir_path)
pipe.scheduler = DDPMScheduler.from_pretrained('stabilityai/stable-diffusion-xl-base-1.0', subfolder="scheduler")
lcm_scheduler = LCMSingleStepScheduler.from_config(pipe.scheduler.config)
# load aggregator weights
pretrained_state_dict = torch.load(f"{instantir_path}/aggregator.pt")
pipe.aggregator.load_state_dict(pretrained_state_dict)
# send to GPU and fp16
pipe.to(device='cuda', dtype=torch.float16)
pipe.aggregator.to(device='cuda', dtype=torch.float16)
Then, you just need to call the pipe
and InstantIR will handle your image!
# load a broken image
low_quality_image = Image.open('./assets/sculpture.png').convert("RGB")
# InstantIR restoration
image = pipe(
image=low_quality_image,
previewer_scheduler=lcm_scheduler,
).images[0]
We provide a python script to launch a local gradio demo of InstantIR, with basic and some advanced features implemented. Start by running the following command in your terminal:
INSTANTIR_PATH=<path_to_InstantIR> python gradio_demo/app.py
Then, visit your local demo via your browser at http://localhost:7860
.
InstantIR is trained on DIV2K, Flickr2K, LSDIR and FFHQ. We adopt dataset weighting to balance the distribution. You can config their weights in config_files/IR_dataset.yaml
. Download these training sets and put them under a same directory, which will be used in the following training configurations.
As described in our paper, the training of InstantIR is conducted in two stages. We provide corresponding .sh
training scripts for each stage. Make sure you have the following arguments adapted to your own use case:
Argument | Value |
---|---|
--pretrained_model_name_or_path |
path to your SDXL folder |
--feature_extractor_path |
path to your DINOv2 folder |
--train_data_dir |
your training data directory |
--output_dir |
path to save model weights |
--logging_dir |
path to save logs |
<num_of_gpus> |
number of available GPUs |
Other training hyperparameters we used in our experiments are provided in the corresponding .sh
scripts. You can tune them according to your own needs.
We sincerely appreciate the community's contribution to InstantIR. Here are some excellent works from the community:
smthemex/ComfyUI_InstantIR_Wrapper
Our work is sponsored by HuggingFace and fal.ai.
If InstantIR is helpful to your work, please cite our paper via:
@article{huang2024instantir,
title={InstantIR: Blind Image Restoration with Instant Generative Reference},
author={Huang, Jen-Yuan and Wang, Haofan and Wang, Qixun and Bai, Xu and Ai, Hao and Xing, Peng and Huang, Jen-Tse},
journal={arXiv preprint arXiv:2410.06551},
year={2024}
}