Fix the CLM performance mismatch between evaluation and manual inference #723
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #719
Goals ⚽
The current CLM strategy is as follows
The main issues of the current implementation are:
0
-embeddings while during training these positions are replaced with trainable [MASK] embeddings. ==> We should have the same training, evaluation, and inference representation strategy.Implementation Details 🚧
Updated the class
CausalLanguageModeling
to:label_mask
as the padding mask information (to keep information about actual past items).I ran the
t4r_paper_repro
script using 5 days of ecomrees46 dataset and these are the results:Testing Details 🔍
test_mask_only_last_item_for_eval
to get target mask information fromlm.masked_targets
test_sequential_tabular_features_ignore_masking
as the inference mode of CLM is changing the inputs by replacing0
padded positions with [MASK] embeddingsFuture work
0
-embedding to represent padded positions.==> We need to re-run T4Rec paper experiments without [MASK] variable and check how the evaluation results are impacted.