Utilities for training reinforcement learning policies with the Soft Actor-Critic (SAC) algorithm. It uses TensorFlow Agents, and includes the following features:
- Following this TF-Agents distributed training example,
the framework is cleanly divided into completely separate programs:
- Experience collection workers (each with their own environment)
- Replay buffer implemented with deepmind/reverb
- SAC policy trainer
- Can seed the replay buffer with experience collected with a random policy, to encourage exploration
- Finegrained control over the number of CPUs allocated to each program
- Checkpointing and tensorboard logging
- "Supervision" of the training using daemontools/supervise automatically resumes the training from the last checkpoint if some program crashes, which is useful when running on a compute cluster
- SLURM compute cluster support
- Configure the environment hyperparameters and their curriculum with JSON