You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 11, 2022. It is now read-only.
For many real-world situations, the task may have hidden state or partially observable features, making the Markovian assumption only semi-valid.
One way around this is to use frame stacking - doable already in Coach with filters.observation.observation_stacking_filter. It may be even better to use LSTM (and bi-directional) LSTM. Agents for this already exist, with the very well cited DRQN being one of them.
In Coach currently, there is the LSTMMiddleware layer. However, from what I understand of the source code it runs along the observations axis (for inputs such as text). Tensorflow of course has the TimeDistributed wrapper (with return_sequences=True) to run LSTM along the temporal axis between transitions.
Could timedistributed LSTM be added as a middleware? (or at the very least "hacked" in, as it would be of immense benefit to my current research, which I am using with a simple behavioural cloning agent)
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
For many real-world situations, the task may have hidden state or partially observable features, making the Markovian assumption only semi-valid.
One way around this is to use frame stacking - doable already in Coach with
filters.observation.observation_stacking_filter
. It may be even better to use LSTM (and bi-directional) LSTM. Agents for this already exist, with the very well cited DRQN being one of them.In Coach currently, there is the
LSTMMiddleware
layer. However, from what I understand of the source code it runs along the observations axis (for inputs such as text). Tensorflow of course has theTimeDistributed
wrapper (withreturn_sequences=True
) to run LSTM along the temporal axis between transitions.Could timedistributed LSTM be added as a middleware? (or at the very least "hacked" in, as it would be of immense benefit to my current research, which I am using with a simple behavioural cloning agent)
The text was updated successfully, but these errors were encountered: