Training Data / Validation Data Overlap? #4

rustyju · 2019-06-16T02:01:09Z

I noticed in the /data folder, the training data in /train includes all data for validation data in /test. There's no validation split in the model so I assume validation datapoints also have a chance to be trained by the model.
Doesn't that lead to overfit and exaggerated model performance?

xiaoyongzhu · 2020-04-21T04:06:36Z

Please correct me if I'm wrong, but @rustyju I think nb_max_episode_steps in the .fit() method actually limits the max steps it can take (which was set to 10,000). Though the files themselves have overlaps, but I think during training the data after 10K ticks are never seen by the agent.

puke3615 · 2022-07-22T09:22:55Z

@xiaoyongzhu I find this logic about nb_max_episode_steps in keras-rl's library file core.py.

if nb_max_episode_steps and episode_step >= nb_max_episode_steps - 1:
    # Force a terminal state.
    done = True

It means that one episode will end when episode_step > nb_max_episode_steps.
Train's data is from 0 to 70K, and test's data is from 0 to 16K.
So they have common data range, 0~10K.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Data / Validation Data Overlap? #4

Training Data / Validation Data Overlap? #4

rustyju commented Jun 16, 2019

xiaoyongzhu commented Apr 21, 2020

puke3615 commented Jul 22, 2022

Training Data / Validation Data Overlap? #4

Training Data / Validation Data Overlap? #4

Comments

rustyju commented Jun 16, 2019

xiaoyongzhu commented Apr 21, 2020

puke3615 commented Jul 22, 2022