Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Add 3D occlusion queries #76869

Open
wants to merge 1 commit into
base: 3.x
Choose a base branch
from

Conversation

Ansraer
Copy link
Contributor

@Ansraer Ansraer commented May 9, 2023

This PR makes it possible to use occlusion queries to cull away parts of the scene that are not visible. While not being quite as high quality as the manual occluders we already have this has the advantage that it can be enabled by clicking a simple check box in the settings.

The visibility of culled objects is checked with their bounding box every frame, if the bounding box would be visible the renderer checks the visibility of the original asset in the next frame. Especially in scenes with badly optimized 3d assets that have too many vertices, this can result in nice performance gains (see screenshot below).

Note that the performance of occlusion query can vary a LOT on different hardware & drivers, so I included a settings option to disable this feature depending on the GPU vendor. I also hard-limited this feature to meshes that have more than 256 vertices, due to the overhead of OQ.

The current implementation is as general-purpose as I could make it, with OQ enabled for pretty much all meshes. I have two branches that are more complicated and try to be smarter about this (e.g. exclude animated meshes, etc.) but after a LOT of professional testing trial and error, I think that this version is better for most use cases.

Something that I will probably need to change is the way I generate the query objects, I left a comment at the relevant line of code. Would greatly appreciate feedback on that in particular.

Screenshot

Sorry for the lackluster image, This is the best I could do on short notice. If I remember I will replace this with a better screenshot tomorrow when I take some time to work through my old PRs.

Anyways, this shows the GPU load in a scene that has a bunch of unoptimized meshes. The valleys are when I had a wall between me and the hero assets, the peaks are when I could see them.

Note that this image was taken with the depth pre-pass disabled. If I were to enable it the difference would be even bigger, since the pre-pass needs to evaluate all the vertices a second time.

image


This PR was sponsored by Ramatak with 💚

@Ansraer Ansraer requested review from a team as code owners May 9, 2023 02:00
@akien-mga
Copy link
Member

akien-mga commented May 9, 2023

Please amend the commit message to be more explicit (the detailed PR description is great, but commit messages also matter - see CONTRIBUTING.md or browse through git log for some examples of how we write commits).

@lawnjelly
Copy link
Member

Should be fun to try! 😃
Good overview of occlusion queries (especially popping and sync issues):
https://www.youtube.com/watch?v=wTTe78DBb1U

doc/classes/Viewport.xml Outdated Show resolved Hide resolved
doc/classes/Viewport.xml Outdated Show resolved Hide resolved
@Calinou
Copy link
Member

Calinou commented May 9, 2023

Anyways, this shows the GPU load in a scene that has a bunch of unoptimized meshes

Note that the Windows task manager's GPU usage readout is wildly known to be inaccurate. I suggest using external tools such as RTSS instead, or use --print-fps with V-Sync disabled (this is also available as the Print FPS project setting).

@Ansraer
Copy link
Contributor Author

Ansraer commented May 9, 2023

Yeah, I know. I only used the task manager to get a quick screenshot for the PR description. During development I used a number of different tools to measure performance, but when I wrote the PR description it was quite late and I was too tired to set up everything and then take good screenshots.
The taskmanager screenshot was quick, is close enough to the results I got with renderdoc, and should be easy to understand even for people who are not overly familiar with profiling (HP wanted something he could show to people).

Thanks for the docs feedback, I will change that and the commit message once I am back at my PC.

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks pretty good. I have a few specific concerns that I note in specific comments.

One other concern is the fact that only opaque objects are checked. Transparent objects are never culled. What is the reasoning behind excluding transparent objects? To me it seems like they would benefit just as much.

@@ -1719,6 +1719,13 @@
<member name="rendering/quality/lightmapping/use_bicubic_sampling.mobile" type="bool" setter="" getter="" default="false">
Lower-end override for [member rendering/quality/lightmapping/use_bicubic_sampling] on mobile devices, in order to reduce bandwidth usage.
</member>
<member name="rendering/quality/occlusion_queries/disable_for_vendors" type="String" setter="" getter="" default="&quot;PowerVR,Adreno&quot;">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your testing, for which vendors did this actually improve performance, just ARM Mali?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depended a lot on the platform & hardware. I managed to get improvements (desktop & mobile) on most of them, but Adreno was a burning dumpster fire.
But that might have been just my Adreno device, I have had trouble with it in the past.

Comment on lines +4348 to +4356
if (use_oq) {
render_list.sort_by_depth(false);
} else {
render_list.sort_by_key(false);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the performance impact of this on complex scenes. Doing this will result in doing a lot of state changes that are not necessary.

I am also skeptical that running occlusion queries while building the depth buffer will result in a net performance gain.

Copy link
Contributor Author

@Ansraer Ansraer May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I need to do this since AFAIK I don't have a depth prepass on mobile.

On desktop I managed to get a nice performance boost during the early depth pass since the bounding boxes only need 8 vertices. In badly optimized scenes that can make a suprisingly big difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you may need to have a more fine-grained check here. If the setting is only a benefit when not using a depth prepass, then it should only be enabled when not using a depth prepass.

Sorting is less beneficial while rendering the depth prepass, so I guess it may make sense to sort by depth in the prepass when occlusion queries are enabled.

What is the performance overhead like for rendering an object with occlusion queries enabled on mobile? My understanding is that everyone always uses AABBs of the objects after the full depth buffer is built. That way you are always testing against a complete depth buffer

Comment on lines 4629 to 4638
if (use_oq) {
render_list.sort_by_depth(false);
} else {
render_list.sort_by_key(false);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as above

uint64_t prev_frame = 0;
GLuint query;
};
//TODO: remove stuff from this map when an instance is removed from the rendering server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice where you do this cleanup, but indeed, you will need to ensure that you aren't leaking queries

@@ -3640,6 +3640,46 @@ void RasterizerStorageGLES3::mesh_add_surface(RID p_mesh, uint32_t p_format, VS:
surface->attribs[i] = attribs[i];
}

{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest storing the AABB on CPU side, and then create the vertex buffer on demand the first time it is needed. Otherwise this runs for every object created even if occlusion culling is never used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid doing stuff on demand but maybe you are right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that creating buffers on demand sucks. But we can't add a fixed cost to every single project just to benefit users of a particular feature

@@ -8395,6 +8439,23 @@ void RasterizerStorageGLES3::initialize() {

config.use_physical_light_attenuation = GLOBAL_GET("rendering/quality/shading/use_physical_light_attenuation");

config.use_occlusion_queries = bool(GLOBAL_GET("rendering/quality/occlusion_queries/enable"));
// I <3 different gpu drivers! *screams in agony*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, funny, but should be removed before merging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't be there. Sorry. At least I managed to find & remove all the more explicit notes I left while working.

Comment on lines +4319 to +4325
use_oq = use_oq && VSG::viewport->viewport_get_update_mode(VSG::viewport->current_viewport_id) != VisualServer::ViewportUpdateMode::VIEWPORT_UPDATE_ONCE;
use_oq = use_oq && VSG::viewport->viewport_get_update_mode(VSG::viewport->current_viewport_id) != VisualServer::ViewportUpdateMode::VIEWPORT_UPDATE_DISABLED;
use_oq = use_oq && VSG::viewport->viewport_get_allow_occlusion_queries(VSG::viewport->current_viewport_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling the data from the VSG feels a bit icky to me. I think its not actually a problem as these methods aren't exposed through the CommandQueue interface so this shouldn't stall the rendering thread, but it seems like an messy dependency.

I have a feeling that use_oq should be passed down from the VisualServer into render_scene. That way all this logic can stay with the viewport

Comment on lines +2067 to +2071
glGetQueryObjectuiv(data.query, GL_QUERY_RESULT_AVAILABLE, &query_result_available);
if (query_result_available == GL_FALSE) {
query_result = 1; //let's just assume this is visible
} else {
glGetQueryObjectuiv(data.query, GL_QUERY_RESULT, &query_result);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you check if cur_frame - prev_frame >= 2 here to allow for up to 2 frames in flight.

That being said, the description of GL_QUERY_RESULT_AVAILABLE makes it sound like it doesn't have to wait for the GPU to catch up before returning (as the function would be pointless if it did)

This makes it possible cull away invisible meshes by relying on their AABBs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants