Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why did you need copyTargetQNetwork #5

Open
fevemania opened this issue Mar 8, 2017 · 2 comments
Open

Why did you need copyTargetQNetwork #5

fevemania opened this issue Mar 8, 2017 · 2 comments

Comments

@fevemania
Copy link

fevemania commented Mar 8, 2017

I have no idea about the meaning of copyTargetQNetwork. Why did we need QValueT to eval the QValue_batch? In order to let training process more stable ?

@saselovejulie
Copy link

i'm confuse about this function too,

if self.timeStep % UPDATE_TIME == 0:
self.copyTargetQNetwork()

as this code will transform QValue to QValueT every 100 steps, then why we need two of them?

@FrankRouter
Copy link

This is explained in the DQN nature paper.

We address these instabilities with a novel variant of Q-learning, which uses two key ideas. First, ... Second, we used an iterative update that adjusts the action-values (Q) towards target values that are only periodically updated, thereby reducing correlations with the target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants