demonstration file #9

rasoolfa · 2020-02-24T10:36:19Z

Thanks for releasing codes and demonstrations files for this work.

I got two questions about demonstration files (e.g. hammer-v0_demos.pickle).
I might be missing something here, but how does DDPGfD use demonstrations as these files only contain (s, a, r) not (s,s', a, r)(s' indicates next state)? And can you please provide more details about the structure of those files so it would be easier to compare and reproduce your paper results?

Thanks.

rasoolfa · 2020-02-26T09:20:50Z

I guess one way to utilize demonstrations is to collect samples by using actions in those files.

bennevans · 2020-03-03T06:18:40Z

Since the observations/states are in a time-ordered list you can get s' by taking the state at the next index in the list.

bennevans · 2020-03-03T06:27:17Z

As for the structure of a file, it is a list of dictionaries each representing a trajectory. They have keys

['actions', 'observations', 'rewards', 'init_state_dict']

Which correspond to the actions, states, and rewards across the time of the trajectory + initial state information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demonstration file #9

demonstration file #9

rasoolfa commented Feb 24, 2020 •

edited

Loading

rasoolfa commented Feb 26, 2020

bennevans commented Mar 3, 2020

bennevans commented Mar 3, 2020 •

edited

Loading

demonstration file #9

demonstration file #9

Comments

rasoolfa commented Feb 24, 2020 • edited Loading

rasoolfa commented Feb 26, 2020

bennevans commented Mar 3, 2020

bennevans commented Mar 3, 2020 • edited Loading

rasoolfa commented Feb 24, 2020 •

edited

Loading

bennevans commented Mar 3, 2020 •

edited

Loading