Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset handling is very inefficient #34

Open
mvirgo opened this issue May 8, 2020 · 2 comments
Open

Dataset handling is very inefficient #34

mvirgo opened this issue May 8, 2020 · 2 comments

Comments

@mvirgo
Copy link
Owner

mvirgo commented May 8, 2020

The current way the dataset is loaded for training is super inefficient and loads the whole dataset all at once. As such, I should consider changing the dataset from being stored as a pickle file, as well as whether to use flow_from_directory or similar techniques.

@NickSotir
Copy link

NickSotir commented May 8, 2020

Are you, by any chance, able to provide the raw dataset (meaning the images and labels without being pickled) ?

@mvirgo
Copy link
Owner Author

mvirgo commented Oct 2, 2020

Sorry I missed your comment @NickSotir - my original check of this looked to be that I had deleted it to save space, but looks like I do actually still have it, in the case of the full size images (1,978 images at 1280x720). I have uploaded it here.

I only have the 112x112 versions of the labels it looks like (same as the pickle file), although I think if you were to re-size them as needed, you won't lose much information. You should otherwise be able to use pillow's Image.fromarray() function if you load the pickle files to save these down separately. Note the labels from the pickle file will look like essentially nothing on their own since they are a single channel of 0 for not lane or 1 for lane by pixel. I made a more "human viewable" version with a process similar to the top answer here. The human viewable version, which has the same thing stacked 3 times (for RGB) and instead scaled to 0 to 255, can be found here, or alternatively the "binary" version (what the model used) is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants