-
Notifications
You must be signed in to change notification settings - Fork 634
The meaning of the flag "use_datasets" is confusing. #461
Comments
Note that tf_cnn_benchmarks is no longer maintained, and I recommend you look at the official models instead. To answer your question, Real data is run if the
It doesn't bring the entire dataset to memory, but it brings large chunks at the time. This is done regardless of |
thank you very much for quick answer. Can I ask a few more followed up questions?
Should we find the optimal parameter by trying different combination with those? or Is default setting enough in most cases? like I couldn't see any meaningful difference in terms of throughput when I increase ,for instance, "datasets_parallel_interleave_cycle_length".
Thank you again :) |
@rohan100jain, can you answer these questions? |
The meaning of the flag "use_datasets" is confusing.
Below two flags are flags "use_datasets" and one more which is related to it named "datasets_use_prefetch".
flags.DEFINE_boolean('use_datasets', True,
'Enable use of datasets for input pipeline'
flags.DEFINE_boolean('datasets_use_prefetch', True,
'Enable use of prefetched datasets for input pipeline. '
'This option is meaningless if use_datasets=False.')
At the first time, I thought 'use_datasets' is for choosing between synthetic data and real dataset.
But apparently, It doesn't look like that from my own observation.
So here is what I did and the situation I am encountering.
I set 'datasets_use_prefetch' False
I set 'use_datasets' True
Run Resnet50 model with batch size 128, with 100 training iterations (or 100 steps in other words)
For the first time running, disk i/o goes up. More specifically, read operation goes up around 20MB/s.
AND AGAIN, after first running finished, I run again the same model with the same flag setting but for 200 training iterations.
For the first 100 iterations, Read operation doesn't go up. However, after 100 iteration meaning from 101 iterations, Read operation goes up again.
From this observation, I am guess now if I set 'use_datasets' flag True, then it stores already read dataset somewhere in disk and you bring them all from disk to main memory before starting the training iteration.
Am I understanding correctly?
If I do, then What is the difference from 'datasets_use_prefetch' flag??
It looks complicated questions but essence is I don't understand flag definition fully ;)
Thank you in advance.
The text was updated successfully, but these errors were encountered: