You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, the kernels are awesome to support prefill-generate at the same round and it is predictable to have a better performance.
However, as most inference/serving frameworks are Python-based, the cpp-only architecture prevents the project from further application. So is there any plan to wrap it with pybind11 so that the kernel can be used in PyTorch?
The text was updated successfully, but these errors were encountered:
Hi, the kernels are awesome to support prefill-generate at the same round and it is predictable to have a better performance.
However, as most inference/serving frameworks are Python-based, the cpp-only architecture prevents the project from further application. So is there any plan to wrap it with pybind11 so that the kernel can be used in PyTorch?
The text was updated successfully, but these errors were encountered: