Match the default ignore index to PyTorch's #1076
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PyTorch's default
ignore_index
value is -100: https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.htmlI don't know if there's a reason for having a different default. Maybe we just copied this from https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436728a420e85796c57dba7e/model.py#L187
But matching the default value can enable lowering to faster kernels that don't implement support for values different than -100. For example https://github.com/unslothai/unsloth/blob/a0cc0d163843a403a23e5cd94d20121690bd6830/unsloth/kernels/cross_entropy_loss.py#L65 or https://github.com/NVIDIA/apex/blob/b496d85fb88a801d8e680872a12822de310951fd/apex/contrib/csrc/xentropy/xentropy_kernel.cu#L703