We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I read from here. Why do the program only use the current state and the next state? Why only using the two state can work? Thank you @songrotek
The text was updated successfully, but these errors were encountered:
反过来想,为什么不只用1个state呢,而用了2个state
Sorry, something went wrong.
关键这两个state是紧挨着的, 就是说第二个state有情况,是前若干步决定的啊
执行前的画面, 执行的动作, reward, 执行后的画面, terminal. 这5个元素组成一个训练集. http://blog.csdn.net/songrotek/article/details/50580904 这个里面写了这个这个算法的要素, 我也不是很清楚. 可以一起探讨下
@guotong1988 我看代码是这样的, 每次执行操作获得一帧画面. currentState = [画面1, 画面2, 画面3, 画面4] newState = np.append(self.currentState[:,:,1:],nextObservation,axis = 2) 执行完的newState = [画面2, 画面3, 画面4, 画面5]
No branches or pull requests
I read from here.
Why do the program only use the current state and the next state?
Why only using the two state can work?
Thank you @songrotek
The text was updated successfully, but these errors were encountered: