Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU implementation of hamming distance #541

Open
wants to merge 111 commits into
base: main
Choose a base branch
from

Conversation

felixpetschko
Copy link
Collaborator

Hamming distance implementation with numba.cuda for GPU support.
This is built on top of the changes in Hamming distance implementation with Numba #512

felixpetschko and others added 30 commits April 29, 2024 13:28
@grst
Copy link
Collaborator

grst commented Nov 4, 2024

Hi @felixpetschko, what's the status here? Do you need anything from myself or Severin?

I've seen you switched to Cupy, could you elaborate how that compares to the numba implementation?

@felixpetschko
Copy link
Collaborator Author

Hi @grst! I am mainly done with my implementation here. Currently the speedup on my laptop for 1 million cells for the ir_dist function with hamming is at around 10 (45 vs. 480 seconds) compared to the new fast numba CPU implementation (and probably >100 compared to the original CPU implemenation). I think this is also the maximum speedup I would aim at for now, because there are currently some sequential parts (1 cpu) in the ir_dist function besides the hamming GPU kernel and the upstream processing for reading and preparing the data takes already longer anyway. So further optimization of the hamming kernel wouldn't be very effective.

My plan would be to prepare a pull request that is ready for review over the next days.

The reasons for switching to CuPy were the following:
Numba cuda only provides limited cuda features and you never really know how the numba code is mapped to the cuda features internally. However CuPy allows you to write the cuda kernels directly in C++ and offers all cuda features (that i have seen so far). Also most online ressources about GPU programming are about cuda kernels written in C++ and numba cuda is very niche. Also other programmers that might look at the code in the future will probably only know C++ cuda kernels (if they already did GPU programming). When doing GPU programming I use profiling tools to find out what is actually going on at the hardware and what the compiler did, so having this additional abstraction level with numba can actually be a nightmare.

@grst
Copy link
Collaborator

grst commented Nov 5, 2024

Makes sense, thanks!
Still not sure if this should be merged in to scirpy or into rapids-singlecell then. @Intron7, I'll also bring that up at the next core dev meeting what's our long-term strategy here.

@Intron7
Copy link
Member

Intron7 commented Nov 8, 2024

Hey @felixpetschko can you send me a larger dataset to test this? I have some ideas and want to see if this works.

@grst
Copy link
Collaborator

grst commented Nov 16, 2024

Still not sure if this should be merged in to scirpy or into rapids-singlecell then. @Intron7, I'll also bring that up at the next core dev meeting what's our long-term strategy here.

@felixpetschko, the outcome of this discussion was that the function stays here, and we'll setup a GPU CI for scirpy. @ilan-gold or @flying-sheep can help with that once this PR is ready.

@grst grst changed the title Hamming distance implementation with numba.cuda (GPU) GPU implementation of hamming distance Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

3 participants