-
Notifications
You must be signed in to change notification settings - Fork 1.9k
CATS, CATS pdf for Continuous Actions
CATS is a contextual bandit algorithm with a continuous action space. You can find the related paper here. It uses epsilon greedy exploration with tree policy classes and smoothing.
CATS, utilizing the features given as input, will first choose a center from a continuous action range using a tree policy, and then will use a bandwidth to determine a radius of randomization around the chosen center (centers or discrete actions). The depth of the tree and the bandwidth need to be specified beforehand.
The cats
reduction calls into sample_pdf
that samples from cats_pdf
. cats_pdf
will in turn call into cb_explore_pdf
which will eventually call into cats_tree
.
The returned pdf (from cats_pdf
) will consist of the chosen action range (chosen action center adjusted with the bandwidth to return a range) with the exploit probability being the density of that range. The remaining range(s) will have a density calculated as the explore probability uniformly distributed amongst the remaining ranges. sample_pdf
will then sample from the given pdf values and return a single chosen action, along with the probability density at the sampled location.
NOTE: CATS was released in VW 8.9 with having the bandwidth
smoothing parameter be a property of the num_actions
. Since VW 8.10, bandwidth
was adjusted to be a property of the continuous range (max_value
- min_value
).
Example: min_value = 0
, max_value = 32
, num_actions = 8
, bandwidth = 1
. This gives us a continuous range of 32
and a unit range of 32 / 8 = 4
Let's say that vw predicted a continuous action inside the second unit range [4, 8]
with a centre of 6
.
In VW 8.9: with bandwidth
being a property of the number of actions the smoothing would happen across 3 unit-ranges i.e. the first unit range, the second unit range (the one predicted), and the third unit range. Resulting in a pdf with higher density inside the range [0, 4] and [4, 8] and [8, 12] => [0, 12]
In master: with bandwidth
being a property of the continuous range the smoothing would happen across the predicted centre 6
plus/minus bandwidth
, resulting in a pdf with higher density inside the range [5, 7]
.
cats (i.e. cats_pdf
after sampling)
vw --cats <num_actions> --bandwidth <bw> --min_value <min value> --max_value <max value> -d <data file>
cats_pdf
vw --cats_pdf <num_actions> --bandwidth <bw> --min_value <min value> --max_value <max value> -d <data file>
The first argument passed (num_actions
) specifies the discrete actions (centers) and therefore the depth of the tree that will be log2(num_actions)
. The bandwidth
determines the randomization radius around the chosen center/action. min_value
and max_value
is the overall range of the action space. All the normal parameters are valid such as the data file and prediction output. The data file should be in the input format described below.
The label type for continuous actions is VW::cb_continuous::continuous_label
. It must be supplied when learning. The cost is the cost associated with the action chosen (negative reward) and the pdf_value
is the density of the pdf of the chosen value (the sampled location).
struct continuous_label_elm
{
float action; // the continuous action
float cost; // the cost of this class
float pdf_value; // the pdf density of the chosen location, specifies the probability the data collection policy chose this action
};
struct continuous_label
{
v_array<continuous_label_elm> costs;
};
The prediction type for continuous actions is VW::continuous_actions::probability_density_function_value
for cats
and VW::continuous_actions::probability_density_function
for cats_pdf
and is defined as follows:
struct probability_density_function_value
{
float action; // continuous action
float pdf_value; // pdf value
};
struct pdf_segment
{
float left; // starting point
float right; // ending point
float pdf_value; // height
};
using probability_density_function = v_array<pdf_segment>;
Continuous actions (cats
, cats_pdf
) format is a single-line format.
Labels are required when learning. If passed during testing, test labels omit the entire action:cost:pdf_value
section.
ca action:cost:pdf_value |[namespace] <features>
action, cost, and pdf_value are floats
and the action
must fall in the [min_value
, max_value
] range.
Labelled example
ca 185.121:0.657567:6.20426e-05 | <features>
ca 772.592:0.458316:6.20426e-05 | <features>
ca 15140.6:0.31791:6.20426e-05 | <features>
For an unlabelled example the action:cost:pdf_value
section can be excluded.
Unlabelled example
| <features>
| <features>
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: