Name		Name	Last commit message	Last commit date
parent directory ..
README.rst		README.rst
__init__.py		__init__.py
hyperparams.yml		hyperparams.yml
model_setup.py		model_setup.py
util.py		util.py

README.rst

Bernoulli Bandits

Implementation of several exploration strategies.

The bandit has K actions. An action produces a reward r of 1.0 with probability 0 <= θ_k <= 1, which is unknown to the agent, but fixed over time.

The Agent's objective is to minimize regret over a fixed number T of action selections:

p = T θ^* - sum{t=1 to T}(r_t)

where θ^* = max_k(θ_k)

Real-world analogy:

Clinical trials - we have K pills and T sick patients. After taking a pill, the patient is cured with probability θ_k. Task is to find the most efficient pill.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bernoulli_bandits

bernoulli_bandits

README.rst

Bernoulli Bandits

Files

bernoulli_bandits

Directory actions

More options

Directory actions

More options

Latest commit

History

bernoulli_bandits

Folders and files

parent directory

README.rst

Bernoulli Bandits