This repository contains the main results and code for reproducing the experiments performed in the paper.
We also include pretrained models for each tested method on each Safelife task.
The paper can be found on arxiv
Reward function specification can be difficult. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoided side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead while leading the agent to complete the specified task and avoid many side effects.
SafeLife is a novel environment to test the safety of reinforcement learning agents. The long term goal of this project is to develop training environments and benchmarks for numerous technical reinforcement learning safety problems, with the following attributes:
Install the Safelife environment by following the instructions on their repository
Alternatively, here are some basic instructions for a local install
pip3 install -r requirements.txt
python3 setup.py build_ext --inplace
Note that we use version 1.0 of Safelife. Some large changes that we have not thoroughly tested, were implemented in the current master branch of Safelife
The train
script is an easy way to get agents up and running using the default proximal policy optimization implementation. Just run
./train --algo aup
to start training. Saved files including checkpoints, logging file, and intermediate episode videos are stored in data/aup/<task>
.
We include saved models for AUP and the PPO baseline, for each SafeLife task.
We trained agents on four different Safelife tasks. Two of our tasks involve placing cells on goal tiles, with an initially static board. In this scenario, the board is initialized with many (append_still
), or fewer green cells (append_still-easy
). The third task considers the same goal, but the board initializes with dynamic yellow cells that spawn more cells (append_spawn
). In the final task, the agent is tasked with removing red cell patterns from the initially static board (prune-still
). We show the main results (reward and side-effects) below, for all considered methods, on each task.
GIF files for each task can be found in the GIFs directory.