Replies: 1 comment 13 replies
-
The way we handle this is with |
Beta Was this translation helpful? Give feedback.
13 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Create a full backend is a lot of work, Some actual backend only need to implement some optimised OP, like BLAS, AMX, ... or tinyBLAS
I have some test for use the RDNA3 iGPU of AMD CPU (AMD Ryzen 9 7940HS w/ Radeon 780M Graphics)
I want to create some hight speed gemm for FP8 on CPU, but for good speed we need to use more "classique" matmul kernel, like the 5 level BLIS structure.
When XDNA driver will be available on linux I'll like to have a look on this NPU.
A full backend is nice for discrete accelerator, but to much work (copy) for integreted accelerator that use CPU memory.
Next there is some "idea" to build multiple CPU backend and use the "best" supported for the current CPU.
So my feeling is that is may be good to have the possibility to "register" OP, and select at runtime those which are possible, and the best for the current compute. May be let the user chose some of them...
Beta Was this translation helpful? Give feedback.
All reactions