You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 11, 2022. It is now read-only.
For ClippedPPO, PPO and ActorCritic I was not able to get the signals defined in there init-Method.
Loss, Gradients, Likelihood, KL Divergence, etc...
I'm not sure if it is a issue in my Environment implementation. But DQN logs its signals. I also checked the signals dumpy by update_log. For the mentiond agents, the self.episode_signals includes duplicate entries for the signals not logged. As the signals are defined on several inheritents of the agent-class but still saved to self.episode_signals multiple times. Obviously only the latest created will be updated with values in the train-Method.
Also, It could be to the behavior of updating signals before every episode. As gradients are only available after training they might get reseted after last training iteration as a new episode starts.
However, I do have experiments with ClippedPPO where those signals were logged, but I can't recreate this.
Any suggestions?
The text was updated successfully, but these errors were encountered:
ClippedPPOAgent with num_consecutive_playing_steps = EnvironmentEpisodes(15)
CSV dumper is set to dump_signals_to_csv_every_x_episodes = 5
Before the training after 15 episodes the csv will be dumped due to 15 % 5 = 0, the last 5 episodes will be dumped including episode 15 with no training values (loss, graidents, etc...).
The training happens and training values will be generated. The training values will be saved in the 15th episode in the loggers pandas datarframe. As this line is already dumped it never will be written to the CSV.
I updated clipped_ppo_agent.py as following.
Simply added a decrement of last_line_idx_written_to_csv in the logger.
def train(self):
if self._should_train():
for network in self.networks.values():
network.set_is_training(True)
dataset = self.memory.transitions
update_internal_state = self.ap.algorithm.update_pre_network_filters_state_on_train
dataset = self.pre_network_filter.filter(dataset, deep_copy=False,
update_internal_state=update_internal_state)
batch = Batch(dataset)
for training_step in range(self.ap.algorithm.num_consecutive_training_steps):
self.networks['main'].sync()
self.fill_advantages(batch)
# take only the requested number of steps
if isinstance(self.ap.algorithm.num_consecutive_playing_steps, EnvironmentSteps):
dataset = dataset[:self.ap.algorithm.num_consecutive_playing_steps.num_steps]
shuffle(dataset)
batch = Batch(dataset)
self.train_network(batch, self.ap.algorithm.optimization_epochs)
for network in self.networks.values():
network.set_is_training(False)
self.post_training_commands()
self.training_iteration += 1
# should be done in order to update the data that has been accumulated * while not playing *
self.update_log()
self.agent_logger.last_line_idx_written_to_csv -= 1
return None
crzdg
added a commit
to crzdg/coach
that referenced
this issue
May 26, 2020
I encountered a strange behavior.
For ClippedPPO, PPO and ActorCritic I was not able to get the signals defined in there init-Method.
Loss, Gradients, Likelihood, KL Divergence, etc...
I'm not sure if it is a issue in my Environment implementation. But DQN logs its signals. I also checked the signals dumpy by update_log. For the mentiond agents, the self.episode_signals includes duplicate entries for the signals not logged. As the signals are defined on several inheritents of the agent-class but still saved to self.episode_signals multiple times. Obviously only the latest created will be updated with values in the train-Method.
Also, It could be to the behavior of updating signals before every episode. As gradients are only available after training they might get reseted after last training iteration as a new episode starts.
However, I do have experiments with ClippedPPO where those signals were logged, but I can't recreate this.
Any suggestions?
The text was updated successfully, but these errors were encountered: