Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could path tessellation be done using the GPU? #18

Open
wtholliday opened this issue May 16, 2019 · 2 comments
Open

could path tessellation be done using the GPU? #18

wtholliday opened this issue May 16, 2019 · 2 comments

Comments

@wtholliday
Copy link

In my profiles, bezier tessellation is consuming considerable CPU time, while the GPU is mostly idle.

I've tried optimizing bezier tessellation on the front-end with little success.

I was wondering about the feasibility of supporting path tessellation (nvg__flattenPaths) on the GPU. This would involve a change to the front-end, but could be easily made compatible with other back-ends.

Bezier tessellation on the GPU could work as follows. The input is a series of Bezier segments to be flattened. A compute kernel is responsible for flattening. Each segment is processed by a thread. If the segment is flat (either by a flag on the segment, or a test if the flag isn't set), then the segment is outputted. If not flat, two segments are outputted according to bezier subdivision. Atomic operations would be used to append to the output buffer (note that this wouldn't preserve segment order, and that might be an issue. If it is, then a prefix-sum could be used to generate output indices).

Here is some pseudocode:

kernel void subdivideBezier(device BezierSegment* input,
                                               device BezierSegment* output,
                                               device atomic_uint& outCount,
                                               uint gid [[ therad_index_in_grid ]]) {

  BezierSegment out[2];
  int n=1;

  BezierSegment s = input[gid];

  if(s.flat) {
     out[0] = s;
  } else if(flattnessTest(s)) {
     s.flat = true;
     out[0] = s;
  } else {
     out[0] = leftChild(s);
     out[1] = rightChild(s);
     n = 2;
   }

   int i = atomic_add(&outCount, n);
   output[i] = out[0];
   if(n == 2) output[I+1] = out[1];
}

This kernel would be run in multiple passes over the curves according to the maximum subdivision level (hardcoded at 10 in the front-end). Buffers would be sized to accept the maximum number of segments (2^10).

The back end would have to handle generating segments for strokes, line-cap, line-end, etc.

The other GPU path rendering approaches I'm aware of involve sophisticated fragment shaders, so this may be easier to implement.

It may also be worth looking at https://github.com/linebender/piet-metal. It currently doesn't seem to handle curves on the GPU (Beziers are subdivided on the CPU), but takes a distance-field approach to path rendering.

See also http://hhoppe.com/ravg.pdf and https://www.microsoft.com/en-us/research/wp-content/uploads/2005/01/p1000-loop.pdf.

@olliwang
Copy link
Member

olliwang commented May 16, 2019

MetalNanoVG follows the style of the corresponded NanoVG project. To do what you suggested here, we need to touch the original NanoVG code and MetalNanoVG won't share the same nanovg.c anymore. How about implementing such improvement in NanoVG/OpenGL first?

@wtholliday
Copy link
Author

wtholliday commented May 16, 2019

Unfortunately, OpenGL doesn't have compute kernels, so doing it the way I described isn't possible with OpenGL. If they switched to a Vulkan backend, then it's possible.

I think the changes to nanovg.c might be minimal. I'd be more concerned about the amount of time required to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants