Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch processing mode #26

Closed
auroracramer opened this issue Jun 18, 2019 · 10 comments
Closed

Add batch processing mode #26

auroracramer opened this issue Jun 18, 2019 · 10 comments

Comments

@auroracramer
Copy link
Collaborator

Something else to consider is a batch processing mode. i.e. making more efficient use of the GPU by predicting multiple files at once.

Probably the least messy option would be to separate some of the interior code of get_audio_embedding for the case of audio into their own functions and make a get_audio_embedding_batch function that calls most of the same functions. We would also have a process_audio_file_batch function.

I thought about changing get_audio_embedding so that it can either take in a single audio array, or a list of audio arrays (and probably a list of corresponding sample rates). While this might consolidate multiple usecases into one function, it'd probably get pretty messy so it's probably best we don't do this.

Regarding the visual frame embedding extraction, we could ask the same question, though there might be more nuance depending on if we allow for individual images to be processed or not (I think we should). In the case of videos though, multiple frames are already being provided at once. So it raises a question (to me at least) whether we allow for get_vframe_embedding (as I'm currently calling it) should support both a single frame as well as multiple. This also raises the question of whether we allow for frames of multiple sizes or not.

Thoughts?

@justinsalamon
Copy link
Collaborator

Isn't the most elegant option to just have a single function that takes in a single sample OR a list of samples, and then computes the embedding for everything? Basically what you propose in the middle paragraph? Not sure what the downsides to that are?

@auroracramer
Copy link
Collaborator Author

I was thinking the downsides would be all of the involved type introspection and type checking, which maybe isn't so bad if we're very clear. I just get kinda nervous in Python when implementing things where the types of the inputs can change how the function works, particularly when dealing with iterable types. I suppose though if we either allow for non-np.ndarray iterables or just specifically restrict it to lists, then it might be fine. I was thinking there would be less surprises if there were specific functions for single file and batch.

@justinsalamon
Copy link
Collaborator

All valid points. My concern on the other end is API creep (paraphrasing on feature creep), where the API gets a little crowded. Any chance you can outline the current set of functions we envision for the API (including audio & vision) but excluding batch processing, so we see where we're at?

@auroracramer
Copy link
Collaborator Author

Sure, I'll put that in #19.

@auroracramer
Copy link
Collaborator Author

I've updated the proposed API changes in #19. Regarding batch processing if we wanted to not add too many functions, then we could do this:

For get_audio_embedding, batch mode would be used if audio is a non-np.ndarray iterable. sr could be an iterable also of the same length as audio if there is variable sample rates. There would have to be a check to make sure that if sr is iterable, it matches audio in length (and audio must also be a non-np.ndarray iterable). We could also add a batch_size argument to control how big the batches are (and how much is loaded in memory at once).

For get_vframe_embedding, a similar approach would be taken where image_arr as a non-np.npdarray iterable would result in batch processing. Similarly, frame_rate could be an iterable of the same length as image_arr (if all of the items of the batch are videos). Again, we could also add a batch_size argument to control how big the batches are (and how much is loaded in memory at once).

For process_audio_file, process_vframe_file and process_video_file, if filepath is a non-six.string_type iterable, then we can run in batch mode. We would also add a batch mode here. Though for batch mode we'd have to do some extra stuff outside of the loop to make most efficient use of batching without loading all of the files in at once.

Thoughts?

@justinsalamon
Copy link
Collaborator

Sounds reasonable I guess? Is there strong motivation to support any non-ndarray iterable as opposed to forcing it to be a native python list? My thinking being that a stricter type requirement could help prevent confusion?

@auroracramer
Copy link
Collaborator Author

The main motivation is for allowing for users to provide generators. But we can always just limit it to lists for now and see if there's a demand for that use case.

@justinsalamon
Copy link
Collaborator

Maybe I'd start with only supporting lists (simpler API, easier unit tests), and we can decide to expand that if there's demand down the line.

@auroracramer
Copy link
Collaborator Author

FYI: being addressed in #31

@auroracramer
Copy link
Collaborator Author

Closed by #37.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants