Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the CLM performance mismatch between evaluation and manual inference #723

Merged
merged 1 commit into from
Jun 21, 2023

Conversation

sararb
Copy link
Contributor

@sararb sararb commented Jun 21, 2023

Fixes #719

Goals ⚽

The current CLM strategy is as follows
image
The main issues of the current implementation are:

  • We mask useful past information when we evaluate on the last item only ==> This is wrong as past information is a valuable context for predictions.
  • At inference, the padded positions are represented with 0-embeddings while during training these positions are replaced with trainable [MASK] embeddings. ==> We should have the same training, evaluation, and inference representation strategy.

Implementation Details 🚧

  • Updated the class CausalLanguageModeling to:

    • Replace padded positions with [MASK] embeddings at inference.
    • During the evaluation of the last item, I defined the label_mask as the padding mask information (to keep information about actual past items).
  • I ran the t4r_paper_repro script using 5 days of ecomrees46 dataset and these are the results:

**CLM run using main branch (w/o fix)**
Recall@10 of manually masked test data = 0.3915199603272998
eval//next-item/recall_at_10 0.25108

**Fix the evaluation on last item by not masking the past information**
eval_/next-item/recall_at_10 = 0.2980385422706604
Recall@10 of manually masked test data = 0.29812381188527975

Testing Details 🔍

  • Changed test_mask_only_last_item_for_eval to get target mask information from lm.masked_targets
  • Changed test_sequential_tabular_features_ignore_masking as the inference mode of CLM is changing the inputs by replacing 0 padded positions with [MASK] embeddings

Future work

  • The [MASK] embeddings is not used to generate the next-item prediction scores and this raises a question about whether we should remove it and just use 0-embedding to represent padded positions.
    ==> We need to re-run T4Rec paper experiments without [MASK] variable and check how the evaluation results are impacted.

@sararb sararb added the bug Something isn't working label Jun 21, 2023
@sararb sararb requested review from gabrielspmoreira and rnyak June 21, 2023 10:08
@sararb sararb self-assigned this Jun 21, 2023
@github-actions
Copy link

@@ -290,7 +294,8 @@ def _compute_masked_targets(
label_seq_trg_eval[rows_ids, last_item_sessions] = labels[rows_ids, last_item_sessions]
# Updating labels and mask
labels = label_seq_trg_eval
mask_labels = label_seq_trg_eval != self.padding_idx
# We only mask padded positions
mask_labels = item_ids != self.padding_idx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Member

@gabrielspmoreira gabrielspmoreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @sararb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] XLNET-CLM eval recall metric value does not match with custom np based recall metric value
2 participants