- Reinforcement Learning. Acting in a dynamic environment by trying to optimize a deferred reward.
- A/B Testing - Bandit algorithm as an alternative to randomized controlled trials.
- Drug Discovery
- General Adversarial Networks (GAN).
- We have difficulty defining a reward function, but have examples of what good looks like.