How does vsg solve performance bottlenecks? #1349

idealist-dg · 2024-12-19T03:29:41Z

idealist-dg
Dec 19, 2024

Hello, everyone!

I am using vsg to develop EDA software, I am running into a performance bottleneck, now my program is only 10 frames, RenderDoc performance timer screenshot below.

The circle border in EDA requires more than 20 points, and the blue and red draw calls draw the same circle separately. As you can see, I have maximized the usage of instance rendering, which is why the number of instances is as high as 4,295,946.

I've learned most of the vsg examples, and since I want to show them all now, there's nothing to cull, and the LOD for a circle is probably just a few less discrete points, my attempt to use vsg::STATIC_DATA as a vsg array property has not changed. I don't understand why there is so much difference in time when the number of vertices called in blue and red is similar.

I currently only use vsg::ref_ptrvsg::Commands draw_commands;

As you can see my draw call is as follows, vertexes are the 20 vertices of the circle, indexes are the 20 indexes of the vertices, matCol012 stores the transformation matrix for each instance, each instance uses 3 vsg::vec3 to form a matrix, that's all.

It would be helpful if you could suggest improvements to my work from vsg's existing features.

Thank you in advance！

Answered by AnyOldName3

Dec 19, 2024

99 million vertices is quite a lot for a 3050, so I wouldn't expect amazing performance if you can't cull anything. It's almost certainly going to be better to split your draw calls and set up cull nodes so you only need to pay for the few hundred/thousand that you're showing at once. That doesn't mean one draw call per circle, but there'll be a middle ground that performs much better than what you're seeing now where the CPU overhead from more draw calls and more frustum intersection tests is well worth the reduction in work for the GPU.

As for why one of your huge draw calls is being attributed all the time cost, and none of the others are, it's unclear. The obvious things to blame (e.g…

View full answer

robertosfield · 2024-12-19T09:18:25Z

robertosfield
Dec 19, 2024
Collaborator

I'm afraid you've dived into low level before we understand the wider context of what you are trying to do. It's far more helpful to start from top down rather than bottom up.

What do you mean by EDA software?
Could you provide a screenshot to illustrate what you are rendering?
What hardware are you using?
What OS?
What drivers?
Is the VSG and your application built in Release mode?
How do the CPU vs GPU loads compare?

1 reply

idealist-dg Dec 19, 2024
Author

The render looks like this.

I need to render the circle in the image above, a circle consists of more than 20 vertices, but the number of repetitions is very large.
I use an NVIDIA GeForce RTX 3050 graphics card and windows10 operating system

This is a performance view of the program at runtime, it looks like the GPU is sleeping, but my program is only 10 frames, and both the program and vsg are in release mode.

robertosfield · 2024-12-19T13:15:31Z

robertosfield
Dec 19, 2024
Collaborator

Could try the vsg::Instrumentation, specifically the vsg::Profiler version, see vsginstrumentation or any other example that assigns instrumentation. This will output high level stats on CPU and GPU costs to console or a file if you assign one.

At this point I have no idea what you're only getting 10fps while the screenshot is simple enough I'd expect 10's of thousands of FPS with vysnc off. You aren't running app through a virtual machine or across a network or anything?

How many nodes do you have in your scene? How many state and bind vertex etc. calls? How many vertices and primitives?

1 reply

idealist-dg Dec 20, 2024
Author

I will try more functions of vsg, thank you for your reply.

AnyOldName3 · 2024-12-19T17:04:52Z

AnyOldName3
Dec 19, 2024

99 million vertices is quite a lot for a 3050, so I wouldn't expect amazing performance if you can't cull anything. It's almost certainly going to be better to split your draw calls and set up cull nodes so you only need to pay for the few hundred/thousand that you're showing at once. That doesn't mean one draw call per circle, but there'll be a middle ground that performs much better than what you're seeing now where the CPU overhead from more draw calls and more frustum intersection tests is well worth the reduction in work for the GPU.

As for why one of your huge draw calls is being attributed all the time cost, and none of the others are, it's unclear. The obvious things to blame (e.g. being able to skip rasterising more primitives or fewer samples passing the depth and stencil tests) have comparable numbers in the RenderDoc screenshot, so it might just come down to the timing values being misleading. As GPUs are free to do things like reorder draw calls as long as the observable effects are the same, sometimes it's hard to say which time was spent on which draw calls.

Finally, I wouldn't put too much faith in the figures that the Windows Task Manager gives for GPU performance. The overall load is typically a mix of several factors, so if one aspect of the GPU is so overloaded that the rest is work-starved, you can get really low usage figures. For your specific case, you might expect vertex processing or the rasteriser to be entirely busy, but the texture units and render outputs to be waiting with nothing to do most of the time. Sometimes you can avoid this problem by changing one of the dropdowns above one of the graphs to view a different metric, but that only covers the things Microsoft have decided to expose a graph for. You might get more useful information from your graphics driver or a third-party tool like MSI Afterburner - at the minimum, you'll be able to see whether the GPU's running at idle or load clock speeds, and know whether you'll need to persuade the driver that this is a high-performance application.

1 reply

idealist-dg Dec 20, 2024
Author

This is very helpful to me. Thank you for your professional reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does vsg solve performance bottlenecks? #1349

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How does vsg solve performance bottlenecks? #1349

idealist-dg Dec 19, 2024

Replies: 3 comments · 3 replies

robertosfield Dec 19, 2024 Collaborator

idealist-dg Dec 19, 2024 Author

robertosfield Dec 19, 2024 Collaborator

idealist-dg Dec 20, 2024 Author

AnyOldName3 Dec 19, 2024

idealist-dg Dec 20, 2024 Author

idealist-dg
Dec 19, 2024

Replies: 3 comments 3 replies

robertosfield
Dec 19, 2024
Collaborator

idealist-dg Dec 19, 2024
Author

robertosfield
Dec 19, 2024
Collaborator

idealist-dg Dec 20, 2024
Author

AnyOldName3
Dec 19, 2024

idealist-dg Dec 20, 2024
Author