Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower VRAM usage by only having one model loaded at a time #46

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 53 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,61 @@
# clip-interrogator
# clip-interrogator-with-less-VRAM

*Want to figure out what a good prompt might be to create new images like an existing one? The **CLIP Interrogator** is here to get you answers!*

This version uses less VRAM than the main repo by only having one model loaded at a time.

When you create an `Interrogator`:
```py
ci = Interrogator(Config())
```

The BLIP and CLIP models are both loaded, but only BLIP is on the GPU, the CLIP stays in RAM.

When you actually do inference:
```py
ci.interrogate(image)
# Or:
# ci.interrogate_classic(image)
# ci.interrogate_fast(image)
```

BLIP inference is done, it gets unloaded then CLIP gets loaded and infers.
If you run it again, CLIP is done first, then BLIP is loaded, to reduce pointless loading and unloading.

By using this, it (`classic` or `fast`, normal doesn't quite fit) can be run on as little as 4GB of VRAM, the main repo needing at least 6GB.

> But wouldn't loading a new model every time I want to interrogate an image be terrible for performance?

\- me

Absolutely.

There's little performance overhead for just one interrogation, since it's essentially lazy loading the CLIP model, but for multiple images, there will be a noticable effect.

That's why I made the `interrogate_batch` functions:
```py
# files = Some list of strings
images = [Image.open(f).convert("RGB") for f in files]
ci.interrogate_batch(images)
```

This does BLIP inference on each of the images, *then* loads the CLIP model, saving some performance.
There are also `interrogate_{classic,fast}_batch` functions.

## Run it!

Bash (linux/unix):
```sh
$ ./run_cli.py -i input.png -m $MODE
```

Windows:
```cmd
$ python run_cli.py -i input.png -m $MODE
```

Where `$MODE` is either `best`, `classic` or `fast` (default `best`)

Run Version 2 on Colab, HuggingFace, and Replicate!

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pharmapsychotic/clip-interrogator/blob/main/clip_interrogator.ipynb) [![Generic badge](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/pharma/CLIP-Interrogator) [![Replicate](https://replicate.com/pharmapsychotic/clip-interrogator/badge)](https://replicate.com/pharmapsychotic/clip-interrogator)
Expand Down
Loading