You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A linear CUDA grid can have 2^31-1 blocks (in the x dimension), each of size 1024 elements, for a total of a little under 2^41 threads. Currently, our types and grid_info functions assume all dimensions can fit within 32-bit integers... and that is not the case.
At the same time, it is costly to default to use 64-bit values for dimensions when a kernel author knows that the dimensions don't actually exceed 32-bits. (Limiting to 16 bits is less useful, since NVIDIA GPU cores don't operate faster on 16-bit integers).
So, we need to figure out how to support over-32-bit dimensions while not forcing them on users by default. Currently we simply do not support this.
The text was updated successfully, but these errors were encountered:
A linear CUDA grid can have 2^31-1 blocks (in the x dimension), each of size 1024 elements, for a total of a little under 2^41 threads. Currently, our types and grid_info functions assume all dimensions can fit within 32-bit integers... and that is not the case.
At the same time, it is costly to default to use 64-bit values for dimensions when a kernel author knows that the dimensions don't actually exceed 32-bits. (Limiting to 16 bits is less useful, since NVIDIA GPU cores don't operate faster on 16-bit integers).
So, we need to figure out how to support over-32-bit dimensions while not forcing them on users by default. Currently we simply do not support this.
The text was updated successfully, but these errors were encountered: