-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: speed up the execution #7
Comments
To speed up the "querying" part I can use faiss, the effect of this switch will be more noticeable on larger datasets. But, given that model loading attributes to most of the execution time, I want to address it first. |
Or consider storing the vectors in LMDB to speed up the search and reduce the RAM consumption during querying. Needs testing. |
For faster model loading, an option would be to create separate models for CLIP's vision transformer and text transformer, and only load the text transformer when querying. This article detailing a zero-copy approach to model-loading in pytorch might also be worth a read. |
@Seon82, I like the suggestion to split the model. Thank you! About zero-copy, if I understand correctly, this means having to keep the Ray always running, which isn't much different from the "daemon" solution. |
Exactly! Just throwing it in as an alternative implementation of the daemon. I agree that it's probably not the most elegant way of doing things though. |
I started using rclip so much that I went down the path of putting a minimal FastAPI web service with a Vue.js client over my rclip database. It loads both the model and the relevant columns from the rclip database into memory. You can see that project here; with a live demo with a quarter million images running here. [hope you don't mind the name of my git repo - I can rename it if you'd like] It is indeed pretty memory intensive - with a resident-size of about 2GB.
Perhaps the best approach is to have an optional daemon that does nothing except return CLIP embeddings listening on some port. If the daemon happens to be running, rclip could use it; but if it happens to not be running, it would continue working the way it's working now. That would give the best of both worlds - standalone and fully functional without a bloated daemon for typical use; but an option to speed things up if/when you ever wanted to do many searches back-to-back. |
Hi @ramayer! Great job on the rclip-server! :-) The name is cool, I don't mind it. I'm worried that a daemon that has a CLIP model loaded will still consume a few hundred megabytes of RAM, and this is still "a lot" for some NASes. But, it can be a good option for users with enough spare RAM. Currently, I favor the option of splitting CLIP into text-only and image-only networks and loading only text-CLIP when querying. It will provide a performance bump without any downsides. And, if implementing the daemon later, it will be much more memory-efficient with the text-CLIP only. |
When dealing with millions of images, I would instead load feature vectors into memory in batches of, let's say, 100_000. This will allow you to control the memory consumption for the price of a slightly slower search. Using some kind of feature vectors indexing will also make sense on a million+ images. faiss is a fantastic library that can help with this. |
Wonder if the guys behind the clip library itself would consider a patch to lazily load the models as needed. |
I'm getting 3.7 seconds average execution time on my laptop (i7-7700HQ) when searching through 73 thousand images, where:
0.85s is "import clip"
1.75s is model loading
0.47s is querying the data from the SQLite DB
0.26s is actual querying
On my NAS (Intel Celeron J3455), it executes in 7.8s average, where:
1.73s "import clip"
3.22s model loading
1.87s getting features
0.8s search
~65% of the execution time is the CLIP model loading, so reducing this time can be the vector worth exploring first.
One of the options to take is to make rclip a daemon that has the model loaded all the time and query it, but this results in the extra RAM consumption of a few hundred megabytes, and, given that querying is a relatively rare operation, I don't like it.
Another option is to convert the CLIP model to Tensorflow Lite. It can be tricky but should reduce both the RAM consumption and the execution time, so it may be worth exploring.
The text was updated successfully, but these errors were encountered: