Continue kernel-izing operators #113

lukem12345 · 2024-10-12T21:05:00Z

Prior to PR #109, CombinatorialSpaces contained two copies - one for CPU, another for CUDA - of a kernel for each of three wedge products. In PR #109, these 6 kernel instances were reduced to 2 by merging them behind a single kernel abstraction: namely @kernel from KernelAbstractions.jl.

For example, this:

function dec_c_wedge_product!(::Type{Tuple{1,1}}, wedge_terms, α, β, val_pack)
    e, coeffs, simples = val_pack
    @inbounds for i in simples
        ae0, ae1, ae2 = α[e[1, i]], α[e[2, i]], α[e[3, i]]
        be0, be1, be2 = β[e[1, i]], β[e[2, i]], β[e[3, i]]
        c1, c2, c3 = coeffs[1, i], coeffs[2, i], coeffs[3, i]
        wedge_terms[i] = (c1 * (ae2 * be1 - ae1 * be2) + c2 * (ae2 * be0 - ae0 * be2) + c3 * (ae1 * be0 - ae0 * be1))
    end
    return wedge_terms
end

and this:

function dec_cu_ker_c_wedge_product_11!(res, α, β, wedge_cache)
  e, c = wedge_cache[1], wedge_cache[2]
  i = (blockIdx().x - Int32(1)) * blockDim().x + threadIdx().x   
  stride = gridDim().x * blockDim().x
  @inbounds while i <= Int32(length(res))
    e0, e1, e2 = e[Int32(1), i], e[Int32(2), i], e[Int32(3), i]
    c1, c2, c3 = c[Int32(1), i], c[Int32(2), i], c[Int32(3), i]
    ae0, ae1, ae2 = α[e0], α[e1], α[e2]
    be0, be1, be2 = β[e0], β[e1], β[e2]
    res[i] = (c1 * (ae2 * be1 - ae1 * be2) + c2 * (ae2 * be0 - ae0 * be2) + c3 * (ae1 * be0 - ae0 * be1))
    i += stride
  end
  nothing
end

became that:

@kernel function wedge_kernel_11!(res, @Const(α), @Const(β), @Const(e), @Const(c))
  i = @index(Global)
  e0, e1, e2 = e[Int32(1), i], e[Int32(2), i], e[Int32(3), i]
  c1, c2, c3 = c[Int32(1), i], c[Int32(2), i], c[Int32(3), i]
  ae0, ae1, ae2 = α[e0], α[e1], α[e2]
  be0, be1, be2 = β[e0], β[e1], β[e2]
 @inbounds res[i] = (c1 * (ae2 * be1 - ae1 * be2) + c2 * (ae2 * be0 - ae0 * be2) + c3 * (ae1 * be0 - ae0 * be1))
end

The main qualities that inspired this original wedge product PR were:

Increasing maintainability (by de-duplicating code), and
Supporting more backends.

Following technical discussions, however, made some qualities more apparent:
3. More easily handling threading,
4. Performance improvements relating to the above,
6. Abstracting away logic related to looping over indices,
7. Further abstractions that can now bubble up the function-call hierarchy,
8. Performance improvements that kernel fusion can allow, and
9. Downstream code can now more easily swap between backends.
(Among various others.)

So, we should continue kernelizing our binary operators (such as the interior product and Lie derivative) and unary operators (which are computed as sparse matrix-vector multiplications, but could be made more efficient from defining their own kernels).

As an aside, PR #109 did not break the API around the wedge product, but a follow-up PR should perform a refactoring that is less coupled to the “old way” of caching the wedge product. Further, we should continue to examine whether having explicit extensions for different backends is necessary, or whether a more stream-lined process is possible.

The text was updated successfully, but these errors were encountered:

lukem12345 added the enhancement New feature or request label Oct 12, 2024

lukem12345 mentioned this issue Oct 24, 2024

Create In-Place Variants for new Wedges AlgebraicJulia/Decapodes.jl#208

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue kernel-izing operators #113

Continue kernel-izing operators #113

lukem12345 commented Oct 12, 2024 •

edited

Loading

Continue kernel-izing operators #113

Continue kernel-izing operators #113

Comments

lukem12345 commented Oct 12, 2024 • edited Loading

lukem12345 commented Oct 12, 2024 •

edited

Loading