-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Contextual Bandits with Large Action Spaces (LAS)
What: Contextual Bandits (CB) with many actions to explore over (> 50). Combine with epsilon-greedy or squarecb exploration.
Publication: https://proceedings.mlr.press/v162/zhu22b.html
Large action spaces is an algorithm that lets exploration happen efficiently when there are multiple actions in a contextual bandit dataset. The main idea behind it is to eliminate actions that are similar and explore only over the most diverse actions in the dataset.
Once the redundant actions are eliminated, the remaining actions are processed by either epsilon-greedy or squarecb and the final result will be a probability mass function over the chosen actions. The redundant actions will have zero probabilities.
The way actions are eliminated is based on two main methods:
- Randomized truncated singular value decomposition over the sparse matrix of the actions representations, in order to reduce the dimensionality of the actions representation (https://arxiv.org/abs/0909.4061)
- Apply c-approximate barycentric spanner on the left singular vectors produced after the SVD (Awerbuch & Kleinberg STOC'04: https://www.cs.cornell.edu/~rdk/papers/OLSP.pdf)
-
max_actions <max_actions>
gives an upper bound to how many actions will be explored over. The resulting actions with non-zero probabilities might be less thanmax_actions
if the available actions are very similar. Otherwise they will be eithermax_actions
ormax_actions + 1
in the case that the predicted action happened to be filtered out by the algorithm. - exploration algorithm: LAS is a filtering reduction can be combined with any available exploration algorithm although the paper was written with squarecb in mind
Advanced options:
-
spanner_c <c>
parameter for computing the c-approximate spanner (please consult the paper for details) -
thread_pool_size <tps>
default set to(number_of_available_hw_threads - 1) / 2
defines how many threads will be used during the execution of the algorithm -
las_hint_explicit_simd
enables faster SIMD implementations if the architecture supports them. This only works for quadratic interactions on x86 Linux for now
Lets say we have a very small example consisting of 5 actions
| math:2.0 music:0.5 sports:0.1 science:4.0 computers:0.1 literature:0 social_media:0 dancing:0 news:0
| math:0 music:1.5 sports:0 science:0 computers:0 literature:2.0 social_media:0 dancing:2.0 news:0
| math:0 music:1.5 sports:4.0 science:0 computers:0 literature:2.0 social_media:4.0 dancing:2.0 news:4.0
| math:0 music:1.5 sports:0 science:0 computers:0 literature:2.0 social_media:4.0 dancing:2.0 news:0
| math:0 music:1.5 sports:0 science:0 computers:0 literature:2.0 social_media:4.0 dancing:2.0 news:0
Each action is described by some characteristics, some are stronger that others in specific actions. If we replace each categorical feature with an index we can get VW's actions representation matrix in a dense format representation from VW
math -> 0, music -> 1 , sports -> 2, etc
We can represent this as a matrix where each row is an action
math | music | sports | science | computers | literature | social_media | dancing | news |
---|---|---|---|---|---|---|---|---|
2.0 | 0.5 | 0.1 | 4.0 | 0.1 | 0 | 0 | 0 | 0 |
0 | 1.5 | 0 | 0 | 0 | 2.0 | 0 | 2.0 | 0 |
0 | 1.5 | 4.0 | 0 | 0 | 2.0 | 4.0 | 2.0 | 4.0 |
0 | 1.5 | 0 | 0 | 0 | 2.0 | 4.0 | 2.0 | 0 |
0 | 1.5 | 0 | 0 | 0 | 2.0 | 4.0 | 2.0 | 0 |
Just by looking at the actions represented this way we can see that the first action is quite distinct and that the following 4 actions have stronger similarities (the last two actions are duplicates). We can also see that from the 4 similar actions, action 3 is the one that stands out next.
If we preform truncated randomized SVD on A, setting the reduced dimension to d = 2 (--max_actions 2
) we can examine the singular values and the left singular vectors. We expect the left singular vectors to hold the "essence" of the original matrix but in a reduced dimension space
dimension 1 | dimension 2 |
---|---|
-0.638989 | 0.0765502 |
0.0988239 | 0.51184 |
0.562224 | 0.576712 |
-0.364576 | 0.446969 |
-0.364576 | 0.446969 |
and singular values: 19.2866, 5.39217, 1.17257e-07, 4.79044e-08, 0
We can see that we have two large and distinct singular values and then that the remaining three are quite small and similar. This indicates that the original matrix has two dominant components, and that we can also state that it makes sense to select two actions since any more that we select are not going to add much information since any more actions choosen after the first two are going to be similar
The resulting left singular vector matrix is will then be fed to the spanner algorithm. The purpose of the spanner algorithm is to select the d actions that are more diverse and thus increase the volume of the final matrix (volume of final matrix defined by the abs value of the matrix norm). In the above case the two (d = 2) actions that do that are the first and the third actions
Spanner result:
dimension 1 | dimension 2 |
---|---|
-0.638989 | 0.0765502 |
0.562224 | 0.576712 |
The final pmf that VW will return will be spread over the resulting two actions. If we save the above input to a file and run VW with large action spaces (--cb_explore_adf --large_action_space --max_actions 2 --noconstant --predictions predfile.txt
)
cat predfile.txt
0:0.5,2:0.5,1:0,3:0,4:0
we can see that the first and third actions (indexed starting at zero) are given a probability and that the rest of the actions are given a zero probability.
Note: this example is meant to give an intuitive idea of how the LAS algorithm works and is not meant to be completely precise. The output numbers might change as the algorithm evolves and improves. Also, the singular values and left singular vectors were gained by adding print statements to the code in order to write this wiki page and are not available in general
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: