Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Data / Validation Data Overlap? #4

Open
rustyju opened this issue Jun 16, 2019 · 2 comments
Open

Training Data / Validation Data Overlap? #4

rustyju opened this issue Jun 16, 2019 · 2 comments

Comments

@rustyju
Copy link

rustyju commented Jun 16, 2019

I noticed in the /data folder, the training data in /train includes all data for validation data in /test. There's no validation split in the model so I assume validation datapoints also have a chance to be trained by the model.
Doesn't that lead to overfit and exaggerated model performance?

@xiaoyongzhu
Copy link

Please correct me if I'm wrong, but @rustyju I think nb_max_episode_steps in the .fit() method actually limits the max steps it can take (which was set to 10,000). Though the files themselves have overlaps, but I think during training the data after 10K ticks are never seen by the agent.

@puke3615
Copy link

@xiaoyongzhu I find this logic about nb_max_episode_steps in keras-rl's library file core.py.

if nb_max_episode_steps and episode_step >= nb_max_episode_steps - 1:
    # Force a terminal state.
    done = True

It means that one episode will end when episode_step > nb_max_episode_steps.
Train's data is from 0 to 70K, and test's data is from 0 to 16K.
So they have common data range, 0~10K.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants