perf: speed up the execution #7

yurijmikhalevich · 2021-08-24T08:11:21Z

I'm getting 3.7 seconds average execution time on my laptop (i7-7700HQ) when searching through 73 thousand images, where:

0.85s is "import clip"
1.75s is model loading
0.47s is querying the data from the SQLite DB
0.26s is actual querying

On my NAS (Intel Celeron J3455), it executes in 7.8s average, where:
1.73s "import clip"
3.22s model loading
1.87s getting features
0.8s search

~65% of the execution time is the CLIP model loading, so reducing this time can be the vector worth exploring first.

One of the options to take is to make rclip a daemon that has the model loaded all the time and query it, but this results in the extra RAM consumption of a few hundred megabytes, and, given that querying is a relatively rare operation, I don't like it.

Another option is to convert the CLIP model to Tensorflow Lite. It can be tricky but should reduce both the RAM consumption and the execution time, so it may be worth exploring.

yurijmikhalevich · 2021-08-26T05:37:59Z

To speed up the "querying" part I can use faiss, the effect of this switch will be more noticeable on larger datasets. But, given that model loading attributes to most of the execution time, I want to address it first.

yurijmikhalevich · 2021-08-26T06:35:52Z

Or consider storing the vectors in LMDB to speed up the search and reduce the RAM consumption during querying. Needs testing.

Seon82 · 2021-08-27T11:16:28Z

For faster model loading, an option would be to create separate models for CLIP's vision transformer and text transformer, and only load the text transformer when querying.

This article detailing a zero-copy approach to model-loading in pytorch might also be worth a read.

yurijmikhalevich · 2021-08-27T11:37:05Z

@Seon82, I like the suggestion to split the model. Thank you!

About zero-copy, if I understand correctly, this means having to keep the Ray always running, which isn't much different from the "daemon" solution.

Seon82 · 2021-08-27T12:09:51Z

Exactly! Just throwing it in as an alternative implementation of the daemon. I agree that it's probably not the most elegant way of doing things though.

ramayer · 2021-09-27T07:16:42Z

One of the options to take is to make rclip a daemon that has the model loaded all the time and query it, but this results in the extra RAM consumption of a few hundred megabytes, ....

I started using rclip so much that I went down the path of putting a minimal FastAPI web service with a Vue.js client over my rclip database. It loads both the model and the relevant columns from the rclip database into memory. You can see that project here; with a live demo with a quarter million images running here. [hope you don't mind the name of my git repo - I can rename it if you'd like]

It is indeed pretty memory intensive - with a resident-size of about 2GB.

 top - 00:33:09 up 19 days,  7:06,  2 users,  load average: 0.03, 0.04, 0.00
 [...]
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                              
 1563665 opc       20   0 3094784   2.4g 106624 S   0.0  22.8  10:08.26 python3.9

Perhaps the best approach is to have an optional daemon that does nothing except return CLIP embeddings listening on some port. If the daemon happens to be running, rclip could use it; but if it happens to not be running, it would continue working the way it's working now.

That would give the best of both worlds - standalone and fully functional without a bloated daemon for typical use; but an option to speed things up if/when you ever wanted to do many searches back-to-back.

yurijmikhalevich · 2021-09-27T12:20:25Z

Hi @ramayer! Great job on the rclip-server! :-) The name is cool, I don't mind it.

I'm worried that a daemon that has a CLIP model loaded will still consume a few hundred megabytes of RAM, and this is still "a lot" for some NASes. But, it can be a good option for users with enough spare RAM.

Currently, I favor the option of splitting CLIP into text-only and image-only networks and loading only text-CLIP when querying. It will provide a performance bump without any downsides. And, if implementing the daemon later, it will be much more memory-efficient with the text-CLIP only.

yurijmikhalevich · 2021-09-27T12:24:48Z

relevant columns from the rclip database into memory.

When dealing with millions of images, I would instead load feature vectors into memory in batches of, let's say, 100_000. This will allow you to control the memory consumption for the price of a slightly slower search.

Using some kind of feature vectors indexing will also make sense on a million+ images. faiss is a fantastic library that can help with this.

ramayer · 2021-09-30T09:55:28Z

Wonder if the guys behind the clip library itself would consider a patch to lazily load the models as needed.

yurijmikhalevich · 2024-06-16T11:32:39Z

@Seon82, @ramayer, thank you for the suggestions! #125 bringing up to 50% speed up when querying using text queries only was shipped in 1.10.0. It loads the text model only if you aren't querying using image queries. Let me know how you like it :)

yurijmikhalevich added the feat New feature or request label Sep 4, 2021

yurijmikhalevich mentioned this issue Sep 29, 2021

feat: allow searching for similar images by providing a path or URL to an image #20

Closed

yurijmikhalevich added the priority:high High priority issues label Nov 17, 2021

yurijmikhalevich added the hacktoberfest label Oct 22, 2022

yurijmikhalevich added priority:medium Medium priority issues and removed priority:high High priority issues labels Aug 27, 2023

yurijmikhalevich mentioned this issue Jun 16, 2024

feat: speedup text-only search by up to 50% (3.8s → 1.9s) and reduce text-only search RAM consumption by up to 3.8 times (1.7Gb → 0.45Gb) #125

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: speed up the execution #7

perf: speed up the execution #7

yurijmikhalevich commented Aug 24, 2021 •

edited

Loading

yurijmikhalevich commented Aug 26, 2021 •

edited

Loading

yurijmikhalevich commented Aug 26, 2021

Seon82 commented Aug 27, 2021

yurijmikhalevich commented Aug 27, 2021

Seon82 commented Aug 27, 2021

ramayer commented Sep 27, 2021

yurijmikhalevich commented Sep 27, 2021 •

edited

Loading

yurijmikhalevich commented Sep 27, 2021

ramayer commented Sep 30, 2021

yurijmikhalevich commented Jun 16, 2024

perf: speed up the execution #7

perf: speed up the execution #7

Comments

yurijmikhalevich commented Aug 24, 2021 • edited Loading

yurijmikhalevich commented Aug 26, 2021 • edited Loading

yurijmikhalevich commented Aug 26, 2021

Seon82 commented Aug 27, 2021

yurijmikhalevich commented Aug 27, 2021

Seon82 commented Aug 27, 2021

ramayer commented Sep 27, 2021

yurijmikhalevich commented Sep 27, 2021 • edited Loading

yurijmikhalevich commented Sep 27, 2021

ramayer commented Sep 30, 2021

yurijmikhalevich commented Jun 16, 2024

yurijmikhalevich commented Aug 24, 2021 •

edited

Loading

yurijmikhalevich commented Aug 26, 2021 •

edited

Loading

yurijmikhalevich commented Sep 27, 2021 •

edited

Loading