Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize flag in WindowSamplingConfig #2245

Open
bdevoghel opened this issue Sep 20, 2024 · 3 comments · May be fixed by #2246
Open

Normalize flag in WindowSamplingConfig #2245

bdevoghel opened this issue Sep 20, 2024 · 3 comments · May be fixed by #2246

Comments

@bdevoghel
Copy link

🚀 Feature

When creating a SemanticSegmentation**GeoDataConfig** (maybe other use-cases than segmentaion too ? my experience is limited to this one), the scene_to_dataset creates a GeoDataset based on options from the sampling config. As WindowSamplingConfig has no normalize flag, the created dataset can only be created with normalize=True; which is undesired.

The requested feature is to create the normalize flag in the WindowSamplingConfig class and passing it down where required.

Motivation

When creating a SemanticSegmentation**GeoDataConfig**, I want my dataset to be sampled without built-in normalization.

Pitch

The class WindowSamplingConfig should have the additional normalize pydantic field.

Alternatives

Creating my own custom class similar to SemanticSegmentationGeoDataConfig but with the enforces normalize flag set to false.

@bdevoghel bdevoghel linked a pull request Sep 20, 2024 that will close this issue
4 tasks
@AdeelH
Copy link
Collaborator

AdeelH commented Sep 23, 2024

Thanks for the issue and the PR! Can you say a bit more about your use case? Do you want to, say, feed in uint16 values type-casted directly to floats to the model? One existing way to do that is to attach a CastTransformer to your RasterSource. E.g.:

from rastervision.core.data import CastTransformerConfig

raster_source = RasterioSourceConfig(..., transformers=[CastTransformerConfig(to_dtype='float32')])

@bdevoghel
Copy link
Author

bdevoghel commented Sep 25, 2024

For my use case I indeed use transformers for my RasterSource, however the SemanticSegmentationGeoDataConfig (used in the SemanticSegmentationLearnerConfig) creates (through scene_to_dataset()) a dataset SemanticSegmentation(Sliding|Random)WindowGeoDataset with normalize argument set to the default True; which is undesired behavior in my case.

I have custom transformers for my specific data and I don't want the data to be "normalized" (this happens in AlbumentationsDataset's __getitem__()

@AdeelH
Copy link
Collaborator

AdeelH commented Sep 25, 2024

Right, I understand. What I meant was that that normalization only happens if the data is an unsigned integer type, so if the data is already type-casted to a float type (via the CastTransformer), it would not be affected even if normalize=True.

That said, I think exposing normalize via a Config makes sense. But I don't think WindowSamplingConfig is the best place for it, since that is supposed to be concerned only with the sampling of windows and not the transformation of data. What I would suggest is to:

  • Add it DataConfig. That way it will be available in both GeoDataConfig and ImageDataConfig, both of which are based on AlbumentationsDataset. This will also require modifying dir_to_dataset() methods in *ImageDataConfigs, but feel free to skip that in this PR.
  • Rename the field to normalize_uint (only in the Configs and not the Datasets) to better express what it is for.

Please also remember to sign the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants