You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
cuDNN frontend rejects batch_size=0 input with CUDNN_STATUS_BAD_PARAM
Expected behavior
cuDNN should return to me a tensor [0, num_head, sequence_length, dims_per_head]
something like that, maybe the heads/seq are permuted differently, but the important part is that batch_size would be 0.
it would have the same dimensions as Q.
you could even just return Q, probably.
System Environment (please complete the following information):
Accessing cuDNN via torch sdpa,
PyTorch 2.5.0.dev20240811+cu121
torch.backends.cudnn.version()
90100
cudnn_frontend version: not sure how to look this up in PyTorch
cudnn_backend version: 90100
GPU arch: RTX 4090
cuda runtime version: PyTorch bundled 12.1 (though 12.2 is installed on the system)
Additional context
Add any other context about the problem here.
I'm trying to do attention on a batch-of-zero, because my program uses a static graph and I rely on zero-batching (index_select zero-batch of inputs, index_add zero-batch of outputs) to toggle functionality without adding branches to the logic.
The text was updated successfully, but these errors were encountered:
Downstream PyTorch issue:
pytorch/pytorch#133780
Describe the bug
cuDNN frontend rejects batch_size=0 input with
CUDNN_STATUS_BAD_PARAM
Expected behavior
cuDNN should return to me a tensor [0, num_head, sequence_length, dims_per_head]
something like that, maybe the heads/seq are permuted differently, but the important part is that batch_size would be 0.
it would have the same dimensions as Q.
you could even just return Q, probably.
System Environment (please complete the following information):
Accessing cuDNN via torch sdpa,
PyTorch 2.5.0.dev20240811+cu121
g++ (Ubuntu 12.3.0-17ubuntu1) 12.3.0
API logs
Please attach API logs for both cudnn_frontend and cudnn_backend.
To Reproduce
Steps to reproduce the behavior:
Additional context
Add any other context about the problem here.
I'm trying to do attention on a batch-of-zero, because my program uses a static graph and I rely on zero-batching (index_select zero-batch of inputs, index_add zero-batch of outputs) to toggle functionality without adding branches to the logic.
The text was updated successfully, but these errors were encountered: