-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batch processing mode #26
Comments
Isn't the most elegant option to just have a single function that takes in a single sample OR a list of samples, and then computes the embedding for everything? Basically what you propose in the middle paragraph? Not sure what the downsides to that are? |
I was thinking the downsides would be all of the involved type introspection and type checking, which maybe isn't so bad if we're very clear. I just get kinda nervous in Python when implementing things where the types of the inputs can change how the function works, particularly when dealing with iterable types. I suppose though if we either allow for non- |
All valid points. My concern on the other end is API creep (paraphrasing on feature creep), where the API gets a little crowded. Any chance you can outline the current set of functions we envision for the API (including audio & vision) but excluding batch processing, so we see where we're at? |
Sure, I'll put that in #19. |
I've updated the proposed API changes in #19. Regarding batch processing if we wanted to not add too many functions, then we could do this: For For For Thoughts? |
Sounds reasonable I guess? Is there strong motivation to support any non-ndarray iterable as opposed to forcing it to be a native python list? My thinking being that a stricter type requirement could help prevent confusion? |
The main motivation is for allowing for users to provide generators. But we can always just limit it to lists for now and see if there's a demand for that use case. |
Maybe I'd start with only supporting lists (simpler API, easier unit tests), and we can decide to expand that if there's demand down the line. |
FYI: being addressed in #31 |
Closed by #37. |
Something else to consider is a batch processing mode. i.e. making more efficient use of the GPU by predicting multiple files at once.
Probably the least messy option would be to separate some of the interior code of
get_audio_embedding
for the case of audio into their own functions and make aget_audio_embedding_batch
function that calls most of the same functions. We would also have aprocess_audio_file_batch
function.I thought about changing
get_audio_embedding
so that it can either take in a single audio array, or a list of audio arrays (and probably a list of corresponding sample rates). While this might consolidate multiple usecases into one function, it'd probably get pretty messy so it's probably best we don't do this.Regarding the visual frame embedding extraction, we could ask the same question, though there might be more nuance depending on if we allow for individual images to be processed or not (I think we should). In the case of videos though, multiple frames are already being provided at once. So it raises a question (to me at least) whether we allow for
get_vframe_embedding
(as I'm currently calling it) should support both a single frame as well as multiple. This also raises the question of whether we allow for frames of multiple sizes or not.Thoughts?
The text was updated successfully, but these errors were encountered: