[BUG] XLNET-CLM eval recall metric value does not match with custom np based recall metric value #719

rnyak · 2023-06-12T12:25:57Z

Bug description

When we train an XLNet model with CLM masking, the model prints out its own evaluation metrics (ndcg@k, recall@k, etc.) from trainer.evaluate() step. If we want to apply our own custom metric func using numpy something like below, the metric values do not match, but they match if we use MLM masking instead.

def recall(predicted_items: np.ndarray, real_items: np.ndarray) -> float:
    bs, top_k = predicted_items.shape
    valid_rows = real_items != 0

    # reshape predictions and labels to compare
    # the top-10 predicted item-ids with the label id.
    real_items = real_items.reshape(bs, 1, -1)
    predicted_items = predicted_items.reshape(bs, 1, top_k)

    num_relevant = real_items.shape[-1]
    predicted_correct_sum = (predicted_items == real_items).sum(-1)
    predicted_correct_sum = predicted_correct_sum[valid_rows]
    recall_per_row = predicted_correct_sum / num_relevant
    return np.mean(recall_per_row)

Steps/Code to reproduce bug

coming soon.

Expected behavior

Environment details

Transformers4Rec version:
Platform:
Python version:
Huggingface Transformers version:
PyTorch version (GPU?):
Tensorflow version (GPU?):

Additional context

The text was updated successfully, but these errors were encountered:

rnyak · 2023-07-11T12:17:51Z

If I use dev branch, I am getting much higher CLM accuracy metrics (~2.5x higher) compared to MLM from end-to-end example with yoochoose dataset. I think this is not expected.

SPP3000 · 2023-09-19T21:09:40Z

Is this bug already fixed in some T4R version? I am currently experiencing similar discrepancies when it comes to evaluating NDCG and MRR metrics on my dataset. My question is: is it worth creating a reproducible example, or are you already working on it?"

rnyak · 2023-09-19T23:07:30Z

@SPP3000 can you please provide more details about I am currently experiencing similar discrepancies ?

what model you are using? and how do you evaluate? are you using our evaluation method fit_and_evaluate function?

SPP3000 · 2023-09-20T18:54:22Z

@rnyak I just opened a new bug report with all details here.

rnyak · 2023-10-03T23:19:37Z

@SPP3000 are you seeing same issue with XLNet MLM? did you test MLM?

dcy0577 · 2023-11-21T15:31:15Z

Hello, are there any updates regarding this issue? @rnyak @SPP3000

rnyak added bug Something isn't working status/needs-triage P0 labels Jun 12, 2023

rnyak added this to the Merlin 23.06 milestone Jun 12, 2023

rnyak assigned sararb and marcromeyn Jun 12, 2023

viswa-nvidia removed the status/needs-triage label Jun 13, 2023

viswa-nvidia modified the milestones: Merlin 23.06, Merlin 23.07 Jun 13, 2023

sararb mentioned this issue Jun 21, 2023

Fix the CLM performance mismatch between evaluation and manual inference #723

Merged

sararb closed this as completed in #723 Jun 21, 2023

rnyak reopened this Jul 7, 2023

EvenOldridge added P1 and removed P0 labels Sep 11, 2023

dcy0577 mentioned this issue Nov 28, 2023

[BUG] Inconsistent inference and evaluation results of the XLNET-CLM even on the training set! #761

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] XLNET-CLM eval recall metric value does not match with custom np based recall metric value #719

[BUG] XLNET-CLM eval recall metric value does not match with custom np based recall metric value #719

rnyak commented Jun 12, 2023

rnyak commented Jul 11, 2023 •

edited

Loading

SPP3000 commented Sep 19, 2023

rnyak commented Sep 19, 2023

SPP3000 commented Sep 20, 2023 •

edited

Loading

rnyak commented Oct 3, 2023

dcy0577 commented Nov 21, 2023

[BUG] XLNET-CLM eval recall metric value does not match with custom np based recall metric value #719

[BUG] XLNET-CLM eval recall metric value does not match with custom np based recall metric value #719

Comments

rnyak commented Jun 12, 2023

Bug description

Steps/Code to reproduce bug

Expected behavior

Environment details

Additional context

rnyak commented Jul 11, 2023 • edited Loading

SPP3000 commented Sep 19, 2023

rnyak commented Sep 19, 2023

SPP3000 commented Sep 20, 2023 • edited Loading

rnyak commented Oct 3, 2023

dcy0577 commented Nov 21, 2023

rnyak commented Jul 11, 2023 •

edited

Loading

SPP3000 commented Sep 20, 2023 •

edited

Loading