Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ GPU ] GPU Kernel creation time #2723

Open
EunjuYang opened this issue Aug 29, 2024 · 4 comments · May be fixed by #2810
Open

[ GPU ] GPU Kernel creation time #2723

EunjuYang opened this issue Aug 29, 2024 · 4 comments · May be fixed by #2810

Comments

@EunjuYang
Copy link
Contributor

EunjuYang commented Aug 29, 2024

For now, it seems clCreateKernel is called whenever any type of _cl wrapper function is invoked.

For example,

FullyConnectedLayer::forwarding -> dotCl -> dot_cl -> clCreateKernel

Considering the timing of calling _cl functions, this could potentially slow down performance. (Of course, it already avoids duplicate of registration. However, it may not be enough for speed up.)
Since NNTrainer already has a compilation phase, what about to move the kernel registration process into the compilation stage?
During the compilation step, we can identify which computational units are utilized by each layer and generate the corresponding kernels accordingly.

@taos-ci
Copy link

taos-ci commented Aug 29, 2024

:octocat: cibot: Thank you for posting issue #2723. The person in charge will reply soon.

@jijoongmoon
Copy link
Collaborator

I'm thinking to change the current implementation cl context structure.
I think we need cl kernel context and cl_context has it. and when it creates cl layer, it will set the cl kernel context..
And then, it might be possible to register custom cl kernel at finalize function of cl layer.
and also Tensor kernel which is developed by us ( like gemm cl kenels ) could be initialized when kernel context is
created. (like default Kernel we provide)

@s-debadri
Copy link
Contributor

PR #2732 has been created for addressing this issue. Following is the plan:

  • Added registerClKernel function at cl_context to register custom OpenCl kernels as well as in-house kernels.
  • Used hash map to track created Kernel objects. shared_ptr was used t store inside the map.
  • Modified sscal using above Kernel creation flow to remove dependency of layer_context.

In progress: Removing layer_context dependency for existing kernels.

@EunjuYang
Copy link
Contributor Author

[Suggestion / Need Discussion]

As I understood, current GPU's ClContext is expected to be created with LayerNode.
What about moving its first creation time to AppContext creation by adding it as a member variable to AppContext?
By doing so, we can further abstract user API to create Layer without directly calling Cl Layer's name.

image

EunjuYang added a commit to EunjuYang/nntrainer that referenced this issue Nov 29, 2024
- This commit updates transpose_cl.cpp/h to inherit LayerImplCl.
- This commit implements registerClKernels() of transpose_cl layer.
- This commit update cl_context.cpp (applying transpose_cl's update)
- This is the last commit to complete nnstreamer#2723.
- This can close nnstreamer#2723.

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <[email protected]>
EunjuYang added a commit to EunjuYang/nntrainer that referenced this issue Dec 2, 2024
- This commit updates transpose_cl.cpp/h to inherit LayerImplCl.
- This commit implements registerClKernels() of transpose_cl layer.
- This commit update cl_context.cpp (applying transpose_cl's update)
- This is the last commit to complete nnstreamer#2723.
- This can close nnstreamer#2723.

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants