generated from AlgebraicJulia/AlgebraicTemplate.jl
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements in chained subparts syntax #137
Labels
Comments
Oh excellent! |
I was a little hasty closing this issue. It looks like the julia> function loop_expanded!(buffer::Vector{Point3D}, sd::HasDeltaSet2D, ::Type {point_type}) where point_type
@inbounds for t in parts(sd, :DualTri)
buffer[t] = sd[sd[sd[t, :D_∂e1], :D_∂v1], :dual_point]
end
end;
julia> function loop_contracted!(buffer::Vector{Point3D}, sd::HasDeltaSet2D, ::Ty pe{point_type}) where point_type
@inbounds for t in parts(sd, :DualTri)
buffer[t] = sd[t, (:D_∂e1, :D_∂v1, :dual_point)]
end
end;
julia> @btime sd[sd[sd[1, :D_∂e1], :D_∂v1], :dual_point];
1.181 μs (9 allocations: 1.78 KiB)
julia> @btime sd[1, (:D_∂e1, :D_∂v1, :dual_point)];
302.128 ns (3 allocations: 688 bytes)
julia> buffer = Vector{Point3D}(undef, nparts(sd, :DualTri));
julia> @btime loop_expanded!(buffer, sd, Point3{Float64});
4.191 ms (0 allocations: 0 bytes)
julia> buffer = Vector{Point3D}(undef, nparts(sd, :DualTri));
julia> @btime loop_contracted!(buffer, sd, Point3{Float64});
207.264 ms (2398978 allocations: 329.57 MiB) |
I've made a more minimal example. I'm going to take a look at this but haven't interacted with CompTime much, so I'm not sure if the issue will eventually be upstreamed there. julia> using Catlab.Graphs, ACSets, BenchmarkTools
julia> function loop_expanded!(buffer::Vector{Int}, grph::Graph)
@inbounds for i in parts(grph, :E)
buffer[i] = grph[i, :src]
end
end;
julia> function loop_contracted!(buffer::Vector{Int}, grph::Graph)
@inbounds for i in parts(grph, :E)
buffer[i] = grph[i, (:src,)]
end
end;
julia> g = path_graph(Graph, 80_000);
julia> buffer = Vector{Int}(undef, nparts(g, :E));
julia> @btime loop_expanded!(buffer, g);
68.679 μs (0 allocations: 0 bytes)
julia> @btime loop_contracted!(buffer, g);
21.828 ms (398462 allocations: 12.18 MiB)
|
Thanks for looking into it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As with recent work on e.g. issue #135 , we were looking at what there is any performance being left on the table with the ACSets set/ get API.
Here, I looked at the what's known in the docs as the "chaining" or "composition syntax". I trimmed down two versions of an operation used in CombinatorialSpaces.jl. One loops over some data using the composed syntax (
loop_contracted!
), and the other decomposed (loop_expanded!
). All allocations are optimized away by the compiler when using the decomposed syntax, but the same is not done for the composed case. I think that recent PRs will give us a good starting off point.The text was updated successfully, but these errors were encountered: