Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue kernel-izing operators #113

Open
lukem12345 opened this issue Oct 12, 2024 · 0 comments
Open

Continue kernel-izing operators #113

lukem12345 opened this issue Oct 12, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@lukem12345
Copy link
Member

lukem12345 commented Oct 12, 2024

Prior to PR #109, CombinatorialSpaces contained two copies - one for CPU, another for CUDA - of a kernel for each of three wedge products. In PR #109, these 6 kernel instances were reduced to 2 by merging them behind a single kernel abstraction: namely @kernel from KernelAbstractions.jl.

For example, this:

function dec_c_wedge_product!(::Type{Tuple{1,1}}, wedge_terms, α, β, val_pack)
    e, coeffs, simples = val_pack
    @inbounds for i in simples
        ae0, ae1, ae2 = α[e[1, i]], α[e[2, i]], α[e[3, i]]
        be0, be1, be2 = β[e[1, i]], β[e[2, i]], β[e[3, i]]
        c1, c2, c3 = coeffs[1, i], coeffs[2, i], coeffs[3, i]
        wedge_terms[i] = (c1 * (ae2 * be1 - ae1 * be2) + c2 * (ae2 * be0 - ae0 * be2) + c3 * (ae1 * be0 - ae0 * be1))
    end
    return wedge_terms
end

and this:

function dec_cu_ker_c_wedge_product_11!(res, α, β, wedge_cache)
  e, c = wedge_cache[1], wedge_cache[2]
  i = (blockIdx().x - Int32(1)) * blockDim().x + threadIdx().x   
  stride = gridDim().x * blockDim().x
  @inbounds while i <= Int32(length(res))
    e0, e1, e2 = e[Int32(1), i], e[Int32(2), i], e[Int32(3), i]
    c1, c2, c3 = c[Int32(1), i], c[Int32(2), i], c[Int32(3), i]
    ae0, ae1, ae2 = α[e0], α[e1], α[e2]
    be0, be1, be2 = β[e0], β[e1], β[e2]
    res[i] = (c1 * (ae2 * be1 - ae1 * be2) + c2 * (ae2 * be0 - ae0 * be2) + c3 * (ae1 * be0 - ae0 * be1))
    i += stride
  end
  nothing
end

became that:

@kernel function wedge_kernel_11!(res, @Const(α), @Const(β), @Const(e), @Const(c))
  i = @index(Global)
  e0, e1, e2 = e[Int32(1), i], e[Int32(2), i], e[Int32(3), i]
  c1, c2, c3 = c[Int32(1), i], c[Int32(2), i], c[Int32(3), i]
  ae0, ae1, ae2 = α[e0], α[e1], α[e2]
  be0, be1, be2 = β[e0], β[e1], β[e2]
 @inbounds res[i] = (c1 * (ae2 * be1 - ae1 * be2) + c2 * (ae2 * be0 - ae0 * be2) + c3 * (ae1 * be0 - ae0 * be1))
end

The main qualities that inspired this original wedge product PR were:

  1. Increasing maintainability (by de-duplicating code), and
  2. Supporting more backends.

Following technical discussions, however, made some qualities more apparent:
3. More easily handling threading,
4. Performance improvements relating to the above,
6. Abstracting away logic related to looping over indices,
7. Further abstractions that can now bubble up the function-call hierarchy,
8. Performance improvements that kernel fusion can allow, and
9. Downstream code can now more easily swap between backends.
(Among various others.)

So, we should continue kernelizing our binary operators (such as the interior product and Lie derivative) and unary operators (which are computed as sparse matrix-vector multiplications, but could be made more efficient from defining their own kernels).

As an aside, PR #109 did not break the API around the wedge product, but a follow-up PR should perform a refactoring that is less coupled to the “old way” of caching the wedge product. Further, we should continue to examine whether having explicit extensions for different backends is necessary, or whether a more stream-lined process is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant