Better expose and document batch prediction from dataloaders #849

bw4sz · 2024-12-11T19:09:40Z

The current API prioritizes simple functions over performance. In cases where we have lots of images already yielded from a dataloader, its quite annoying to either save them to file to use main.predict_file, or to manipulate the dataloader to get the images preprocessed as expected. We could loop through them individually and call predict_image, but that is really wasteful given modern GPU memory.

for batch in test_loader:
    for image_metadata, image, image_targets in batch:
        # preprocess that image, for example Deepforest likes 0-255 data, channels first
        pred = m.predict_image(channels_first)

not great.

Instead a batch prediction mechanism is just sitting there, already in the codebase, but its not quite clear.

for idx, batch in enumerate(test_loader):
    metadata, images, targets  = batch
    # Preprocessing here 
    predictions = m.predict_step(images,idx)

where predictions is a list of results that have been formatted into dataframes from tensors, like the other predict family of functions. predict_step is a pytorch ligthning method and reserved for trainer.predict it can't be renamed, but it can be wrapped into some other function if we wanted to.

This pathway isn't really in docs and it would take an astute user to recognize it. It should be much faster since the batches might be quite large if you have a big GPU

Next steps

Document this behavior
Consider a predict_batch function that mirrors the format of predict_file, predict_image, predict_tile to help guide users through this.

The text was updated successfully, but these errors were encountered:

RohitP2005 · 2024-12-11T21:07:58Z

hey @bw4sz i would like to solve this issue if its still unassigned!

bw4sz · 2024-12-11T21:35:57Z

Great! Ben Weinstein, Ph.D. Research Scientist University of Florida

…

On Wed, Dec 11, 2024 at 1:08 PM RohitP2005 ***@***.***> wrote: hey @bw4sz <https://github.com/bw4sz> i would like to solve this issue if its still unassigned! — Reply to this email directly, view it on GitHub <#849 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJHBLAE7TOUBIXBPPENCMD2FCSUHAVCNFSM6AAAAABTOFMJVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZXGE4DIOJSGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

RohitP2005 · 2024-12-12T09:32:06Z

Fine, then i will start to work on it.

bw4sz added good first issue Good for newcomers API This tag is used for small improvements to the readability and usability of the python API. labels Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better expose and document batch prediction from dataloaders #849

Better expose and document batch prediction from dataloaders #849

bw4sz commented Dec 11, 2024 •

edited

Loading

RohitP2005 commented Dec 11, 2024

bw4sz commented Dec 11, 2024 via email

RohitP2005 commented Dec 12, 2024

Better expose and document batch prediction from dataloaders #849

Better expose and document batch prediction from dataloaders #849

Comments

bw4sz commented Dec 11, 2024 • edited Loading

Next steps

RohitP2005 commented Dec 11, 2024

bw4sz commented Dec 11, 2024 via email

RohitP2005 commented Dec 12, 2024

bw4sz commented Dec 11, 2024 •

edited

Loading