InstantIR: Blind Image Restoration with
Instant Generative Reference

Jen-Yuan Huang^1 2, Haofan Wang², Qixun Wang², Xu Bai², Hao Ai², Peng Xing², Jen-Tse Huang³

¹Peking University · ²InstantX Team · ³The Chinese University of Hong Kong

InstantIR is a novel single-image restoration model designed to resurrect your damaged images, delivering extrem-quality yet realistic details. You can further boost InstantIR performance with additional text prompts, even achieve customized editing!

📢 News

11/07/2024 🔥 Our Space Demo is online, thanks HuggingFace🤗! Play with InstantIR and leave your feedbacks!
11/03/2024 🔥 We provide a Gradio launching script for InstantIR, you can now deploy it on your local machine!
11/02/2024 🔥 InstantIR is now compatitble with diffusers🧨, you can utilize features from this fascinating package!
10/15/2024 🔥 Code and model released!

📝 TODOs:

Tutorial video
Launch online demo
Remove dependency on local diffusers
Gradio launching script

✨ Usage

Quick start

1. Clone this repo and setting up environment

git clone https://github.com/instantX-research/InstantIR.git
cd InstantIR
conda create -n instantir python=3.9 -y
conda activate instantir
pip install -r requirements.txt

2. Download pre-trained models

InstantIR is built on SDXL and DINOv2. You can download them either directly from 🤗 huggingface or using Python package.

🤗 link	Python command
SDXL	`hf_hub_download(repo_id="stabilityai/stable-diffusion-xl-base-1.0")`
facebook/dinov2-large	`hf_hub_download(repo_id="facebook/dinov2-large")`
InstantX/InstantIR	`hf_hub_download(repo_id="InstantX/InstantIR")`

Note: Make sure to import the package first with from huggingface_hub import hf_hub_download if you are using Python script.

3. Inference

You can run InstantIR inference using infer.sh with the following arguments specified.

infer.sh \
    --sdxl_path <path_to_SDXL> \
    --vision_encoder_path <path_to_DINOv2> \
    --instantir_path <path_to_InstantIR> \
    --test_path <path_to_input> \
    --out_path <path_to_output>

See infer.py for more config options.

4. Using tips

InstantIR is powerful, but with your help it can do better. InstantIR's flexible pipeline makes it tunable to a large extent. Here are some tips we found particularly useful for various cases you may encounter:

Over-smoothing: reduce --cfg to 3.0～5.0. Higher CFG scales can sometimes rigid lines or lack of details.
Low fidelity: set --preview_start to 0.1~0.4 to preserve fidelity from inputs. The previewer can yield misleading references when input latent is too noisy. In such cases, we suggest to disable the previewer at early timesteps.
Local distortions: set --creative_start to 0.6~0.8. This will let InstantIR render freely in the late diffusion process, where the high-frequency details are generated. Smaller --creative_start spares more spaces for creative restoration, but will diminish fidelity.
Faster inference: higher --preview_start and lower --creative_start can both reduce computational costs and accelerate InstantIR inference.

Caution

These features are training-free and thus experimental. If you would like to try, we suggest to tune these parameters case-by-case.

Use InstantIR with diffusers 🧨

InstantIR is fully compatible with diffusers and is supported by all those powerful features in this package. You can directly load InstantIR via diffusers snippet:

# !pip install diffusers opencv-python transformers accelerate
import torch
from PIL import Image

from diffusers import DDPMScheduler
from schedulers.lcm_single_step_scheduler import LCMSingleStepScheduler

from module.ip_adapter.utils import load_adapter_to_pipe
from pipelines.sdxl_instantir import InstantIRPipeline

# suppose you have InstantIR weights under ./models
instantir_path = f'./models'

# load pretrained models
pipe = InstantIRPipeline.from_pretrained('stabilityai/stable-diffusion-xl-base-1.0', torch_dtype=torch.float16)

# load adapter
load_adapter_to_pipe(
    pipe,
    f"{instantir_path}/adapter.pt",
    image_encoder_or_path = 'facebook/dinov2-large',
)

# load previewer lora
pipe.prepare_previewers(instantir_path)
pipe.scheduler = DDPMScheduler.from_pretrained('stabilityai/stable-diffusion-xl-base-1.0', subfolder="scheduler")
lcm_scheduler = LCMSingleStepScheduler.from_config(pipe.scheduler.config)

# load aggregator weights
pretrained_state_dict = torch.load(f"{instantir_path}/aggregator.pt")
pipe.aggregator.load_state_dict(pretrained_state_dict)

# send to GPU and fp16
pipe.to(device='cuda', dtype=torch.float16)
pipe.aggregator.to(device='cuda', dtype=torch.float16)

Then, you just need to call the pipe and InstantIR will handle your image!

# load a broken image
low_quality_image = Image.open('./assets/sculpture.png').convert("RGB")

# InstantIR restoration
image = pipe(
    image=low_quality_image,
    previewer_scheduler=lcm_scheduler,
).images[0]

Deploy local gradio demo

We provide a python script to launch a local gradio demo of InstantIR, with basic and some advanced features implemented. Start by running the following command in your terminal:

INSTANTIR_PATH=<path_to_InstantIR> python gradio_demo/app.py

Then, visit your local demo via your browser at http://localhost:7860.

⚙️ Training

Prepare data

InstantIR is trained on DIV2K, Flickr2K, LSDIR and FFHQ. We adopt dataset weighting to balance the distribution. You can config their weights in config_files/IR_dataset.yaml. Download these training sets and put them under a same directory, which will be used in the following training configurations.

Two-stage training

As described in our paper, the training of InstantIR is conducted in two stages. We provide corresponding .sh training scripts for each stage. Make sure you have the following arguments adapted to your own use case:

Argument	Value
`--pretrained_model_name_or_path`	path to your SDXL folder
`--feature_extractor_path`	path to your DINOv2 folder
`--train_data_dir`	your training data directory
`--output_dir`	path to save model weights
`--logging_dir`	path to save logs
`<num_of_gpus>`	number of available GPUs

Other training hyperparameters we used in our experiments are provided in the corresponding .sh scripts. You can tune them according to your own needs.

💪 Community Resoucres

We sincerely appreciate the community's contribution to InstantIR. Here are some excellent works from the community:

Duplicate Demo

fffiloni/InstantIR

ComfyUI

smthemex/ComfyUI_InstantIR_Wrapper

👏 Acknowledgment

Our work is sponsored by HuggingFace and fal.ai.

🎓 Citation

If InstantIR is helpful to your work, please cite our paper via:

@article{huang2024instantir,
  title={InstantIR: Blind Image Restoration with Instant Generative Reference},
  author={Huang, Jen-Yuan and Wang, Haofan and Wang, Qixun and Bai, Xu and Ai, Hao and Xing, Peng and Huang, Jen-Tse},
  journal={arXiv preprint arXiv:2410.06551},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

InstantIR: Blind Image Restoration with
Instant Generative Reference

📢 News

📝 TODOs:

✨ Usage

Quick start

1. Clone this repo and setting up environment

2. Download pre-trained models

3. Inference

4. Using tips

Use InstantIR with diffusers 🧨

Deploy local gradio demo

⚙️ Training

Prepare data

Two-stage training

💪 Community Resoucres

Duplicate Demo

ComfyUI

👏 Acknowledgment

🎓 Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

InstantIR: Blind Image Restoration withInstant Generative Reference

📢 News

📝 TODOs:

✨ Usage

Quick start

1. Clone this repo and setting up environment

2. Download pre-trained models

3. Inference

4. Using tips

Use InstantIR with diffusers 🧨

Deploy local gradio demo

⚙️ Training

Prepare data

Two-stage training

💪 Community Resoucres

Duplicate Demo

ComfyUI

👏 Acknowledgment

🎓 Citation

InstantIR: Blind Image Restoration with
Instant Generative Reference