You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, as also discussed in #15 (see there for some more details), the layer-wise 2:4 inference speedups we report were directly produced with NVIDIA's CUTLASS profiler using their prebuilt kernels, no custom code from our side was involved.
Hi, as also discussed in #15 (see there for some more details), the layer-wise 2:4 inference speedups we report were directly produced with NVIDIA's CUTLASS profiler using their prebuilt kernels, no custom code from our side was involved.
Have you tried cuSPARSE? Is it easier to use and more effective than cutlass?
Great work!
I am trying to do inference speedup. Could you please share the code for inference speedup using 2:4 sparsity on Ampere GPUs? Thanks!
The text was updated successfully, but these errors were encountered: