REINFORCE in TensorFlow Implements a basic reinforce algorithm a.k.a. policy gradient for CartPole env. NOTE!!: Converging with mean reward > 300 in TensorFlow 1.8.0; not in current TF 1.2.1.