Fix the CLM performance mismatch between evaluation and manual inference #723

sararb · 2023-06-21T10:08:14Z

Fixes #719

Goals ⚽

The current CLM strategy is as follows

The main issues of the current implementation are:

We mask useful past information when we evaluate on the last item only ==> This is wrong as past information is a valuable context for predictions.
At inference, the padded positions are represented with 0-embeddings while during training these positions are replaced with trainable [MASK] embeddings. ==> We should have the same training, evaluation, and inference representation strategy.

Implementation Details 🚧

Updated the class CausalLanguageModeling to:
- Replace padded positions with [MASK] embeddings at inference.
- During the evaluation of the last item, I defined the label_mask as the padding mask information (to keep information about actual past items).
I ran the t4r_paper_repro script using 5 days of ecomrees46 dataset and these are the results:

**CLM run using main branch (w/o fix)**
Recall@10 of manually masked test data = 0.3915199603272998
eval//next-item/recall_at_10 0.25108

**Fix the evaluation on last item by not masking the past information**
eval_/next-item/recall_at_10 = 0.2980385422706604
Recall@10 of manually masked test data = 0.29812381188527975

Testing Details 🔍

Changed test_mask_only_last_item_for_eval to get target mask information from lm.masked_targets
Changed test_sequential_tabular_features_ignore_masking as the inference mode of CLM is changing the inputs by replacing 0 padded positions with [MASK] embeddings

Future work

The [MASK] embeddings is not used to generate the next-item prediction scores and this raises a question about whether we should remove it and just use 0-embedding to represent padded positions.
==> We need to re-run T4Rec paper experiments without [MASK] variable and check how the evaluation results are impacted.

github-actions · 2023-06-21T10:27:16Z

Documentation preview

https://nvidia-merlin.github.io/Transformers4Rec/review/pr-723

gabrielspmoreira · 2023-06-21T12:09:01Z

transformers4rec/torch/masking.py

@@ -290,7 +294,8 @@ def _compute_masked_targets(
            label_seq_trg_eval[rows_ids, last_item_sessions] = labels[rows_ids, last_item_sessions]
            # Updating labels and mask
            labels = label_seq_trg_eval
-            mask_labels = label_seq_trg_eval != self.padding_idx
+            # We only mask padded positions
+            mask_labels = item_ids != self.padding_idx


Good catch!

gabrielspmoreira

Great job @sararb

fix of clm performance

0f19e42

sararb added the bug Something isn't working label Jun 21, 2023

sararb requested review from gabrielspmoreira and rnyak June 21, 2023 10:08

sararb self-assigned this Jun 21, 2023

gabrielspmoreira reviewed Jun 21, 2023

View reviewed changes

gabrielspmoreira approved these changes Jun 21, 2023

View reviewed changes

sararb merged commit f3c4d2a into main Jun 21, 2023

oliverholworthy pushed a commit that referenced this pull request Jun 21, 2023

fix of clm performance (#723)

edafe97

rnyak deleted the fix-clm-performance branch June 21, 2023 13:12

rnyak mentioned this pull request Jun 21, 2023

[QST] Target feature for causal masking #722

Open

SPP3000 mentioned this pull request Sep 20, 2023

[BUG] Incorrect scores for evaluation #746

Open

sungho-ham mentioned this pull request Jan 4, 2024

[BUG] CausalLanguageModeling masking error on last item only condition #762

Closed

zhouyu5 mentioned this pull request Apr 2, 2024

[BUG] CausalLanguageModeling do not mask last input item #765

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the CLM performance mismatch between evaluation and manual inference #723

Fix the CLM performance mismatch between evaluation and manual inference #723

sararb commented Jun 21, 2023 •

edited

Loading

github-actions bot commented Jun 21, 2023

gabrielspmoreira Jun 21, 2023

gabrielspmoreira left a comment

Fix the CLM performance mismatch between evaluation and manual inference #723

Fix the CLM performance mismatch between evaluation and manual inference #723

Conversation

sararb commented Jun 21, 2023 • edited Loading

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

Future work

github-actions bot commented Jun 21, 2023

Documentation preview

gabrielspmoreira Jun 21, 2023

Choose a reason for hiding this comment

gabrielspmoreira left a comment

Choose a reason for hiding this comment

sararb commented Jun 21, 2023 •

edited

Loading