-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partitioned hashing support on multiple GPU's for trace and constraint commitments. #335
Comments
I think what we need from Winterfell side to support this is adding But, if the user would set
One question is what to do if the length of the row is not evenly divisible by |
I based my issue on current implementation on Polygon's miden-gpu, which is currently unreleased for CUDA, |
Could we make it user's responsibility to set |
I'm not the original author of the CUDA code, so i can't currently tell if there are some advantages of having |
To support trace commitments with multiple GPUs, we need partitioned hashing for both prover and the verifier, so that output from GPU code is the same as the one from the CPU, when computing on multiple devices.
Partition size
The GPU code uses "partition size" to determine the number of columns to be processed per GPU.
Since trace commitment has a minimum column number of 64. 16 as partition size is usually good.
But we can come up with some formula for calculating it automatically.
For e.g.
16 - (num_columns % 16) / (num_columns / 16)
would in most cases work quite well for any number of GPU's.Per device partitioning
Number of partitions is calculated with
num_partitions = (num_columns + partition_size - 1) / partition_size
, and then we calculate how many devices will be used for computing withnum_devices = min(num_devices, num_partitions)
. This means that in case when we have an 8 column matrix, and a partition size = 8 (e.g. in constraint commitments), only a single GPU will be used for computing purposes, sincenum_partitions
would be1
.Hashing process
We need a buffer for storing partitioned linear hash output. The required buffer size is:
num_partitions * num_rows * blowup * CAPACITY
- enough space to store all the partitioned outputs.Each GPU takes
partition_size
number of columns, and it hashes 8 columns at a time up to thepartition_size
. So if we have for e.g. 14 columns, 1 GPU will hash 8 columns and then 6 columns, and write that to the correct position in the buffer. If we set our partition size to 16 on a 64 column trace,num_partitons
would be 4, so we allocate space for4 * lde_domain_size * CAPACITY
, and when hashing we would write linear hash of 16 columns 4 times.If we set partition_size to 8 however, we would need double the allocated space, and we would have 8 linear hash outputs written to the buffer.
After the linear hash step, each linear hash partition is taken, producing
CAPACITY
number of columns, and then the final linear hash is applied from which later on we build a Merkle tree.The text was updated successfully, but these errors were encountered: