-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Weight vector
VW's weight vector has float
(4-byte) weights (where is specified by the -b
option) and each example's features are hashed to an index in . The weight vector is also used to store other vectors needed by more sophisticated learning algorithms, such as the conjugate gradient method (--conjugate_gradient
), or adaptive gradient descent (--adaptive
--invariant
, and --normalized
). In these more sophisticated cases, some small integer multiplier will be used on the size of the weight vector so there's enough room to store all these auxilliary weights side-by-side, in the same 'hash-bucket'.
In other words: when more than one vector is stored in the same global space, every hash-value slot will store multiple "weights". The size (number of floats) in the hash-bucket is called the stride
in the vw
source.
VW uses -b 18
by default. 2^18 is 262144 meaning if you have much less than 262144 distinct features in your training set you should be relatively safe from hash-collisions. If you auto-generate many new features of the fly, like when you use -q
(quadratic), -c
(cubic), or --nn
, you may want to increase the default by requesting a bigger -b
value to avoid hash collisions.
By default, vw
uses -b 18
and normalized/adaptive/invariant SGD. So the overall size allocated for the weight vector is:
= 2^18 * weights_per_stride * (sizeof float) bytes
= 262144 * 3 * 4 bytes
= 3,145,728 bytes
= A bit over 3MB
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: