The Multi-armed Bandit A simple example of how to build a policy-gradient based agent that can solve the multi-armed bandit problem.