CATS, CATS pdf for Continuous Actions

CATS is a contextual bandit algorithm with a continuous action space. You can find the related paper here. It uses epsilon greedy exploration with tree policy classes and smoothing.

CATS, utilizing the features given as input, will first choose a center from a continuous action range using a tree policy, and then will use a bandwidth to determine a radius of randomization around the chosen center (centers or discrete actions). The depth of the tree and the bandwidth need to be specified beforehand.

The cats reduction calls into sample_pdf that samples from cats_pdf. cats_pdf will in turn call into cb_explore_pdf which will eventually call into cats_tree.

The returned pdf (from cats_pdf) will consist of the chosen action range (chosen action center adjusted with the bandwidth to return a range) with the exploit probability being the density of that range. The remaining range(s) will have a density calculated as the explore probability uniformly distributed amongst the remaining ranges. sample_pdf will then sample from the given pdf values and return a single chosen action, along with the probability density at the sampled location.

NOTE: CATS was released in VW 8.9 with having the bandwidth smoothing parameter be a property of the num_actions. Since VW 8.10, bandwidth was adjusted to be a property of the continuous range (max_value - min_value).

Example: min_value = 0, max_value = 32, num_actions = 8, bandwidth = 1. This gives us a continuous range of 32 and a unit range of 32 / 8 = 4 Let's say that vw predicted a continuous action inside the second unit range [4, 8] with a centre of 6.

In VW 8.9: with bandwidth being a property of the number of actions the smoothing would happen across 3 unit-ranges i.e. the first unit range, the second unit range (the one predicted), and the third unit range. Resulting in a pdf with higher density inside the range [0, 4] and [4, 8] and [8, 12] => [0, 12]

In master: with bandwidth being a property of the continuous range the smoothing would happen across the predicted centre 6 plus/minus bandwidth, resulting in a pdf with higher density inside the range [5, 7].

Usage

cats (i.e. cats_pdf after sampling)

vw --cats <num_actions> --bandwidth <bw> --min_value <min value> --max_value <max value> -d <data file>

cats_pdf

vw --cats_pdf <num_actions> --bandwidth <bw> --min_value <min value> --max_value <max value> -d <data file>

The first argument passed (num_actions) specifies the discrete actions (centers) and therefore the depth of the tree that will be log2(num_actions). The bandwidth determines the randomization radius around the chosen center/action. min_value and max_value is the overall range of the action space. All the normal parameters are valid such as the data file and prediction output. The data file should be in the input format described below.

Label Type

The label type for continuous actions is VW::cb_continuous::continuous_label. It must be supplied when learning. The cost is the cost associated with the action chosen (negative reward) and the pdf_value is the density of the pdf of the chosen value (the sampled location).

struct continuous_label_elm
{
  float action;       // the continuous action
  float cost;         // the cost of this class
  float pdf_value;  // the pdf density of the chosen location, specifies the probability the data collection policy chose this action
};

struct continuous_label
{
  v_array<continuous_label_elm> costs;
};

Prediction Type

The prediction type for continuous actions is VW::continuous_actions::probability_density_function_value for cats and VW::continuous_actions::probability_density_function for cats_pdf and is defined as follows:

struct probability_density_function_value
{
  float action;     // continuous action
  float pdf_value;  // pdf value
};

struct pdf_segment
{
  float left;       // starting point
  float right;      // ending point
  float pdf_value;  // height
};

using probability_density_function = v_array<pdf_segment>;

VW text format

Continuous actions (cats, cats_pdf) format is a single-line format.

Labelled

Labels are required when learning. If passed during testing, test labels omit the entire action:cost:pdf_value section.

ca action:cost:pdf_value |[namespace] <features>

action, cost, and pdf_value are floats and the action must fall in the [min_value, max_value] range.

Labelled example

ca 185.121:0.657567:6.20426e-05 | <features>
ca 772.592:0.458316:6.20426e-05 | <features>
ca 15140.6:0.31791:6.20426e-05 | <features>

Unlabelled

For an unlabelled example the action:cost:pdf_value section can be excluded.

Unlabelled example

 | <features>
 | <features>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly