Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add sanity check processing #11

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

hajeressef
Copy link
Contributor

@hajeressef hajeressef commented Dec 7, 2023

Data sanity checks on 'datasetversion'.

  • Check image filename redundancy. Images that have exactly the same filename are tagged with the tag: dup_filename

  • Check exact or near duplicates: tags: dup_image (Used this repo: https://github.com/idealo/imagededup)

  • Tag all images with number of channels and number of bytes for one pixel: Tag format: nbrchannels_nbr_bytes:

  • Find annotations where the area is an "outlier". I used zscore for this. The threshold can become a parameter to enter as it would depend on the data/annotations.

PS: The logging.info() used in utils.py does not work. So the final results are not actually printed. Hence the draft.

@hajeressef hajeressef marked this pull request as draft December 7, 2023 14:09
@hajeressef hajeressef requested a review from PN-picsell December 7, 2023 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant