Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: speed up the execution #7

Open
yurijmikhalevich opened this issue Aug 24, 2021 · 10 comments
Open

perf: speed up the execution #7

yurijmikhalevich opened this issue Aug 24, 2021 · 10 comments
Labels
feat New feature or request hacktoberfest priority:medium Medium priority issues

Comments

@yurijmikhalevich
Copy link
Owner

yurijmikhalevich commented Aug 24, 2021

I'm getting 3.7 seconds average execution time on my laptop (i7-7700HQ) when searching through 73 thousand images, where:

0.85s is "import clip"
1.75s is model loading
0.47s is querying the data from the SQLite DB
0.26s is actual querying

On my NAS (Intel Celeron J3455), it executes in 7.8s average, where:
1.73s "import clip"
3.22s model loading
1.87s getting features
0.8s search

~65% of the execution time is the CLIP model loading, so reducing this time can be the vector worth exploring first.

One of the options to take is to make rclip a daemon that has the model loaded all the time and query it, but this results in the extra RAM consumption of a few hundred megabytes, and, given that querying is a relatively rare operation, I don't like it.

Another option is to convert the CLIP model to Tensorflow Lite. It can be tricky but should reduce both the RAM consumption and the execution time, so it may be worth exploring.

@yurijmikhalevich
Copy link
Owner Author

yurijmikhalevich commented Aug 26, 2021

To speed up the "querying" part I can use faiss, the effect of this switch will be more noticeable on larger datasets. But, given that model loading attributes to most of the execution time, I want to address it first.

@yurijmikhalevich
Copy link
Owner Author

Or consider storing the vectors in LMDB to speed up the search and reduce the RAM consumption during querying. Needs testing.

@Seon82
Copy link

Seon82 commented Aug 27, 2021

For faster model loading, an option would be to create separate models for CLIP's vision transformer and text transformer, and only load the text transformer when querying.

This article detailing a zero-copy approach to model-loading in pytorch might also be worth a read.

@yurijmikhalevich
Copy link
Owner Author

@Seon82, I like the suggestion to split the model. Thank you!

About zero-copy, if I understand correctly, this means having to keep the Ray always running, which isn't much different from the "daemon" solution.

@Seon82
Copy link

Seon82 commented Aug 27, 2021

Exactly! Just throwing it in as an alternative implementation of the daemon. I agree that it's probably not the most elegant way of doing things though.

@yurijmikhalevich yurijmikhalevich added the feat New feature or request label Sep 4, 2021
@ramayer
Copy link
Contributor

ramayer commented Sep 27, 2021

One of the options to take is to make rclip a daemon that has the model loaded all the time and query it, but this results in the extra RAM consumption of a few hundred megabytes, ....

I started using rclip so much that I went down the path of putting a minimal FastAPI web service with a Vue.js client over my rclip database. It loads both the model and the relevant columns from the rclip database into memory. You can see that project here; with a live demo with a quarter million images running here. [hope you don't mind the name of my git repo - I can rename it if you'd like]

It is indeed pretty memory intensive - with a resident-size of about 2GB.

 top - 00:33:09 up 19 days,  7:06,  2 users,  load average: 0.03, 0.04, 0.00
 [...]
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                              
 1563665 opc       20   0 3094784   2.4g 106624 S   0.0  22.8  10:08.26 python3.9                                                                                                                            

Perhaps the best approach is to have an optional daemon that does nothing except return CLIP embeddings listening on some port. If the daemon happens to be running, rclip could use it; but if it happens to not be running, it would continue working the way it's working now.

That would give the best of both worlds - standalone and fully functional without a bloated daemon for typical use; but an option to speed things up if/when you ever wanted to do many searches back-to-back.

@yurijmikhalevich
Copy link
Owner Author

yurijmikhalevich commented Sep 27, 2021

Hi @ramayer! Great job on the rclip-server! :-) The name is cool, I don't mind it.

I'm worried that a daemon that has a CLIP model loaded will still consume a few hundred megabytes of RAM, and this is still "a lot" for some NASes. But, it can be a good option for users with enough spare RAM.

Currently, I favor the option of splitting CLIP into text-only and image-only networks and loading only text-CLIP when querying. It will provide a performance bump without any downsides. And, if implementing the daemon later, it will be much more memory-efficient with the text-CLIP only.

@yurijmikhalevich
Copy link
Owner Author

relevant columns from the rclip database into memory.

When dealing with millions of images, I would instead load feature vectors into memory in batches of, let's say, 100_000. This will allow you to control the memory consumption for the price of a slightly slower search.

Using some kind of feature vectors indexing will also make sense on a million+ images. faiss is a fantastic library that can help with this.

@ramayer
Copy link
Contributor

ramayer commented Sep 30, 2021

Wonder if the guys behind the clip library itself would consider a patch to lazily load the models as needed.

@yurijmikhalevich
Copy link
Owner Author

@Seon82, @ramayer, thank you for the suggestions! #125 bringing up to 50% speed up when querying using text queries only was shipped in 1.10.0. It loads the text model only if you aren't querying using image queries. Let me know how you like it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request hacktoberfest priority:medium Medium priority issues
Projects
None yet
Development

No branches or pull requests

3 participants