Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] - Test Pioneer running on Raspberry Pi 5 #5700

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fluffyfreak
Copy link
Contributor

@fluffyfreak fluffyfreak commented Dec 29, 2023

I got a Raspberry Pi 5 just before Xmas and whilst doing something completely unrelated to Pioneer discovered that it now supports OpenGL 3.1!
In theory this might also work on RPi 4.

Technically Pioneer requires 3.2 however a quick downgrade to 3.1 and it runs with caveats.

  • 20fps on LOW detail, Mars start,1280x720
  • Some rendering issues, possibly z-buffer related
  • planet/terrain rendering is expensive
  • No crash encountered yet!

Creating this as a DRAFT for visibility and to jog some discussion

Commit description:

  • Downgrade to OpenGL 3.1 and in CMakeLists.txt check for arm CPU

  • test on Raspberry Pi 4

  • investigate graphical issues

  • test gameplay further (Gas Giants, saving & loading, sgm models, etc)

  • is requesting 3.1 safe for other platforms or should this be Pi only?

@impaktor
Copy link
Member

impaktor commented Jan 2, 2024

Isn't there some reason we had 3.2 and not 3.1? Would there not be negative consequences when downgrading?

With 20 FPS is this something we would like to provide to a PI? Can the window resolution be made even smaller (like when playing Doom on a 386, with stamp-sized window)

@fluffyfreak
Copy link
Contributor Author

I honestly do not recall why we required 3.2, need to check the commit logs.
My preference to this would be for the Pi to request 3.1 and everything else asks for 3.2 like now. RPi5 builds would all have to be done from code at the moment, I'm not proposing that we add a regular build.

This PR is just to make it build and run on the RPi5 & 4 so that others, Raspberry Pi has a large developer community, can take a look if they're interested.

As for the 20 fps, that appears to be caused by terrain and City rendering exclusively. Get into space and I'm getting a solid 60 fps with VSync on. I haven't tried with it off. In my opinion this means there's possibly some optimisation to be done for Terrain and City rendering.

The resolution could be reduced, that 20 fps was windowed, and there's other tweaks but I think it will be interesting to do some profiling and optimisation.

@fluffyfreak
Copy link
Contributor Author

The update to OpenGL 3.1 --> 3.2 was in #4974 which is probably required for the offscreen rendertarget stuff but I'm not 100% sure. Either way I don't think we should revert to 3.1 for all platforms, but just for Raspberry Pi to allow the RPi4 & 5 to run.

@sturnclaw
Copy link
Member

This is excellent news!

...I'll lead with the (perhaps unnecessary) negatives since I'm that kind of person. I don't think we should attempt to officially support the Pi while Pioneer is still using OpenGL. AFAIK, the Vulkan driver for the Pi's GPU hardware has marginally better performance and is significantly more feature-rich than the GL implementation shipping on the Pi.

I'd rather not maintain additional GL-version backends, especially given the overall progress towards modernizing the rendering architecture and design of Pioneer. Additional hardware quirks and considerations have the potential to be very detrimental friction given the current scope of work needed to achieve compatibility with a more "sane" API like Vulkan. I'm aware I'm preaching to the choir on this one. 😉

That being said, I do want to eventually officially support the Pi or a platform like it once the rendering code in Pioneer is more sane and there's less overhead/waste contributing to the low performance.

Depth buffer issues are most likely related to the GL-on-RPI implementation not supporting the pile of reverse-Z floating-point depth buffer tricks we use. IIRC, part of the move to GL 3.2 was better support for MSAA render targets including the float depth buffer.

The performance of pushing 100k+ terrain vertices through a tile-based rasterizer is unsurprising to me. There may be a few options to improve that in the future (e.g. better horizon culling taking the "real feature height" of a GeoPatch into account), though I'm not fully read in on the literature to know if a Z-prepass style renderer would achieve better performance there.

Off the top of my head, city rendering has a potential to be CPU-bound rather than GPU bound. There's a huge inefficiency there in preparing the instanced transform data which unfortunately can't be easily resolved at the moment without some invasive changes to the SceneGraph API. It's not a problem on modern desktop platforms, but on a platform with low memory bandwidth it will be more pronounced.

The main reason for upgrading to GL 3.2 (more specifically GLSL version 1.50) was the ability to sample from an MSAA texture (e.g. the primary render target) for use in post-processing effects. This will primarily show up in the HDR->LDR tonemap+resolve step of the planned rendering pipeline, but is not currently used at this time.

That being said, I'd strongly prefer not to downgrade to GL 3.1 on any platform other than in an experimental build configuration for the Pi. Eventually I'd like to fully bump the GL version we use to 3.3, to have support for separate samplers and some GLSL improvements (explicit attribute location, floatBitsToInt et. al), as that is slightly closer to the programming model of Vulkan.

Overall, I'm very excited to see proof that Pioneer can run on the Pi. However, I have some strong cautions about targeting it "too early" and creating a dragging anchor which would prevent our adoption of a more performant rendering architecture.

@fluffyfreak
Copy link
Contributor Author

fluffyfreak commented Jan 5, 2024

@Web-eWorks I totally agree, this isn't to give the Raspberry Pi a build, it's just to make it easier for that development work to begin.

A Vulkan renderer is the future for all platforms, either a custom one written for Pioneer, or using BGFX/LLGL or some other wrapper/optimising library.

Seeing the legacy OpenGL renderer move to 3.3, or even all the way upto 4.6 which is still almost 7 years old (July 31, 2017!!!), would completely eliminate the Raspberry Pi OpenGL support and that's a natural progression which I'd be fine with.

Like I say, this just allows us to run it, try some optimisations, and figure out some issues whilst the Pi supports 3.1 and we've not updated the renderer yet that it causes a problem.

Also resolving these build problems with the CMake are necessary regardless of the renderer for compiling on Pi. Having it actually run and RENDER was an actual "Holy shit!" moment when it ran 😆

@fluffyfreak
Copy link
Contributor Author

So given everything said above:

  • this isn't to add a Raspberry Pi build
  • necessary for Pi to compile at all
  • just as a prototype for research

What needs to change in this for it to become a PR?

Copy link
Member

@sturnclaw sturnclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what needs to change in this for it to become a PR?

Change the current #ifdefs to an opt-in model set via a CMake option() call of some sort, and provide a CMake build preset setting the appropriate cache variables for the target hardware.

This is mostly concerned with ensuring we build "out of the box" for distros crazy enough to be shipping a desktop/laptop AArch64 build of Pioneer (where GL3.2+ is presumed to be available.)

I forgot to mention it due to lack of sleep in my previous comment, but despite all of the negatives I listed I'm definitely in favor of providing a build config for Raspberry Pi as long as it's understood by the end user to be experimental and potentially subject to breakage. 😄

CMakeLists.txt Outdated
Comment on lines 55 to 58
if(CMAKE_SYSTEM_PROCESSOR MATCHES "arm" OR CMAKE_SYSTEM_PROCESSOR MATCHES "aarch")
# maybe for performance we can add options here?
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mcpu=cortex-a76 -ffast-math")
else()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of wrapping the USE_SSE42 etc. if-defs, prefer adding a new option(PIONEER_TARGET_RASPBERRY_PI) declaration with the compile flags and adding a preset definition to either CMakePresets.json or a new scripts/CMakeExperimentalPresets.json which sets the appropriate options:

"cacheVariables": {
     "USE_SSE42": false,
     "USE_AVX2": false,
     "PIONEER_TARGET_RASPBERRY_PI": true
}

If added to scripts/CMakeExperimentalPresets.json (preferred), you'll have to manually cp scripts/CMakeExperimentalPresets.json CMakeUserPresets.json before running e.g. cmake --preset linux-arm-raspberry-pi.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only problem with that is defaulting to USE_SSE42 is only correct for x86/64 and wrong for PPC, Arm, RiscV etc so we should guard it somehow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a way to default that option to off on non-x86 style platforms I'm all for it. Otherwise, it's assumed the options are managed via a CMake preset or explicit build flag set from a distribution build harness of some sort.

...which now that I think about it blindly omits Windows ARM64 builds, but I don't think I've seen anyone yet who's tried to build Pioneer on that platform.

SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 3);
#if (defined(__arm__) || defined(__aarch64__) || defined(__PPC64__) || defined(__riscv))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of testing for platform feature flags here, prefer adding an explicit PIONEER_TARGET_RASPBERRY_PI define in buildopts.h and testing for that define. If we're doing legacy backwards-compatibility hacks like this, it should be opt-in at build time, rather than automatic based on CPU architecture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to work this out but I detest CMake, it's the worst possible build system and I can barely hack around in it at the best of times 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably consider Autotools to be worse, but I completely understand where you're coming from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't work out the buildopts.h route but I could do add_definitions(-DPIONEER_ON_RPI=1) in CMakeLists.txt to the same effect.
Currently the committed code still has the platform feature flags set but I tested it on device without them.
If someone has practical advice about a better way to do this I'd appreciate the CMake advice.

I'll change it to #if defined(PIONEER_ON_RPI) in the next commit

CMakeLists.txt Outdated
option(USE_ARM "Compile for ARM compatible microarchitectures" ${PIONEER_TARGET_ARM})
option(PIONEER_TARGET_RASPBERRY_PI "Compile for Raspberry Pi specifically (degrades to OpenGL 3.1)" OFF)

if (USE_ARM)
Copy link
Member

@sturnclaw sturnclaw Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we now have autodetection for ARM platforms based on the CMAKE_SYSTEM_PROCESSOR variable - compilation for ARM64 should set CMAKE_SYSTEM_PROCESSOR to one of aarch64 or ARM64 (likely aarch64 since it's a linux target OS).

This should be done by whatever external tooling or option sets up the correct compiler toolchain for Raspberry Pi ...unless "normal" Raspberry Pi uses standard GCC builds in which case this should be set by the PIONEER_TARGET_RASPBERRY_PI option before we include(cmake/TargetArchitecture.cmake). I'm used to working with the RPi Pico which has a separate GCC toolchain from the system compiler.

@fluffyfreak
Copy link
Contributor Author

Ok so this is stalled because I just cannot fathom CMake.
I can fix rendering issues I reckon, and optimise stuff once it's on the Raspberry Pi but I cannot work out wtf I'm supposed to do with CMake 🤷‍♂️ honestly it's like a vertical difficulty barrier to me.

Is anyone willing or able to help with this build system which I find so awful?

@impaktor
Copy link
Member

@fluffyfreak I volunteer @pcercuei (if ready and willing)

@sturnclaw
Copy link
Member

I can help as well yeah, though my availability today is limited. Are you compiling for RPi on-device or using a cross-compiler from a Windows host? One is far simpler than the other.

If compilation is on-device on the raspberry pi itself, then I can amend the CMakeLists.txt and push that to this branch for you to test with - sadly, I don't own a Raspberry Pi of my own (other than several Picos, but those don't count).

@fluffyfreak
Copy link
Contributor Author

That would be a huge help, I'm not sure where I'm supposed to define or put things, or how it all links up to produce the desired PIONEER_TARGET_RASPBERRY_PI being defined in C++ code so I can use it as a compile time value.

I mean i've gotten it to "work" but I'm stretching the definition of the word "work" too it's limit 🤣

Since we now have ARM detection, somewhere, it needs to do:
If ARM define PIONEER_TARGET_RASPBERRY_PI

Then I can compile the alternative code in C++

I did fix a few other build options for Raspberry Pi, and fixup/change some presets but I think otherwise I'll have to reset this to master HEAD and reimplement the CMake stuff from scratch because I'm not sure what I'm doing with it 🤷‍♂️

@fluffyfreak
Copy link
Contributor Author

Oh and I'm doing it on-device, as I figure that is how most users would be doing it too

@sturnclaw
Copy link
Member

Here's what you'd replace the existing changes to the CMake file with:

# Default to enabling PIONEER_TARGET_RASPBERRY_PI if we're compiling on ARM64
option(PIONEER_TARGET_RASPBERRY_PI "blah blah" ${PIONEER_TARGET_ARM64})

if (PIONEER_TARGET_RASPBERRY_PI)
	add_compile_options("-mcpu=cortex-a76" "-ffast-math")
endif(PIONEER_TARGET_RASPBERRY_PI)

All #if defined in the PR at current should be changed to #if PIONEER_TARGET_RASPBERRY_PI, and include buildopts.h where that symbol is defined to 1 if the PIONEER_TARGET_RASPBERRY_PI CMake option is set.

That's uh... it. @pcercuei could suggest fancier ways of detecting the system is specifically a Raspberry Pi SoC, but defaulting to this mode on ARM64 is "good enough" for now. Note that this does not pass -mcpu=cortex-a76 unless we've indicated we're compiling for RPi, as that value will be wrong for other ARM64 chipsets/processors Pioneer might be compiled on in the future.

@sturnclaw
Copy link
Member

I know I mentioned pushing to the branch, but it turned out to be simple enough I don't think I need to - you should be able to copy-paste the above snippet and make the mentioned changes to C++ and it "should" work.

@pcercuei
Copy link
Contributor

What are you guys trying to achieve exactly?

@pcercuei
Copy link
Contributor

Here's what you'd replace the existing changes to the CMake file with:

# Default to enabling PIONEER_TARGET_RASPBERRY_PI if we're compiling on ARM64
option(PIONEER_TARGET_RASPBERRY_PI "blah blah" ${PIONEER_TARGET_ARM64})

if (PIONEER_TARGET_RASPBERRY_PI)
	add_compile_options("-mcpu=cortex-a76" "-ffast-math")
endif(PIONEER_TARGET_RASPBERRY_PI)

All #if defined in the PR at current should be changed to #if PIONEER_TARGET_RASPBERRY_PI, and include buildopts.h where that symbol is defined to 1 if the PIONEER_TARGET_RASPBERRY_PI CMake option is set.

That's uh... it. @pcercuei could suggest fancier ways of detecting the system is specifically a Raspberry Pi SoC, but defaulting to this mode on ARM64 is "good enough" for now. Note that this does not pass -mcpu=cortex-a76 unless we've indicated we're compiling for RPi, as that value will be wrong for other ARM64 chipsets/processors Pioneer might be compiled on in the future.

My suggestion is - just don't.

Unless you're using a generic distribution on your Pi (e.g. regular Debian), which nobody does, the compiler will already use the correct flags for the system.

And even then, adding board-specific stuff (not even arch-specific) to a cross-platform project is pretty bad design IMHO.

@fluffyfreak
Copy link
Contributor Author

@pcercuei thanks that's good advice.
I might have to use some flags like that to get it to build so it'd be good to find a way to make sure it's only used on Raspberry Pi. Unless I've misunderstood (highly likely).

I'll try it without for now.

This PR will get messy but then I'll sqush it flat (somehow) later on to remove merges etc

@impaktor
Copy link
Member

This PR will get messy but then I'll sqush it flat (somehow) later on to remove merges etc

I'm not sure why there are merges in the PR commit history to begin with, unless there's a need to resolve a conflict. Even then, I always opt for a git rebase upstream/master

Use OpenGL 3.1 on Pi
Add PIONEER_TARGET_RASPBERRY_PI define in CmakeLists
@fluffyfreak
Copy link
Contributor Author

I'm not sure why there are merges in the PR commit history to begin with, unless there's a need to resolve a conflict. Even then, I always opt for a git rebase upstream/master

Mostly because I rarely get to dip into Git anymore and have forgotten half of the little that I used to know 🤷‍♂️

Ok this builds on Raspberry Pi 5 again now.

There's rendering issues, framerate with MSAA off is ~20fps at the Mars start point, but with VSync it's 60fps in space.
Gas giant rings render through the planet 🤦‍♂️ and there's other graphical issues but I'm hoping this could be a starting point for getting other Raspberry Pi users interested in the project.

@fluffyfreak
Copy link
Contributor Author

I should clarify that performance is very variable and I think our terrain kills it.
At 1280x720 windowed, I can get 30fps at the Mars start with

Anisotropic Filtering: off,
MSAA: off,
Scattering: legacy
Planet detail distance: Very low
City detail level: Low
Density of Star field: 25

@sturnclaw
Copy link
Member

I wonder if the RPi 5 even supports a floating point depth buffer. Everything else we do (including ARB_clip_control) seems to be working fine, or you'd see entirely inside-out models.

Do you have a screenshot of an example rendering issue on the Pi?

@fluffyfreak
Copy link
Contributor Author

Here's the gas giant rings issue
image

I think this is either z-buffer related OR it's precision and seeing gaps between the terrain patches
image

more of the above
image

@fluffyfreak
Copy link
Contributor Author

Hmm, there is an error output actually, right after we're setting up some OpenGL state

Warning: OpenGL error(s) during frame:
In function RendererOGL
On line 288
GL_INVALID_OPERATION

Which is this block that includes some depth setup stuff

//XXX bunch of fixed function states here!
glCullFace(GL_BACK);
glFrontFace(GL_CCW);
glEnable(GL_CULL_FACE);
glEnable(GL_DEPTH_TEST);
// use floating-point reverse-Z depth buffer to remove the need for depth buffer hacks
glDepthFunc(GL_GEQUAL);
// clear to 0.0 for use with reverse-Z
glClearDepth(0.0);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
glHint(GL_LINE_SMOOTH_HINT, GL_NICEST);
glEnable(GL_TEXTURE_CUBE_MAP_SEAMLESS);
glEnable(GL_PROGRAM_POINT_SIZE);

glHint(GL_TEXTURE_COMPRESSION_HINT, GL_NICEST);
glHint(GL_FRAGMENT_SHADER_DERIVATIVE_HINT, GL_NICEST);

CHECKERRORS();

@fluffyfreak
Copy link
Contributor Author

It appears that the call to glClipControl is what gives a GL_INVALID_OPERATION.
Sadly commenting out that code and using the GL_NV_depth_buffer_float codepath instead gets rid of the GL_INVALID_OPERATION 🎉 but it doesn't fix any of the rendering issues 🤷

Also I locally added a check for if (!glewIsSupported("GL_ARB_depth_buffer_float")) { and it passes claiming that it does support that extension as you'd hope from an OpenGL 3+ implementation.

Another GL_INVALID_OPERATION comes from glEnable(GL_TEXTURE_CUBE_MAP_SEAMLESS); but I'm guessing that's unrelated to the depth issues

@fluffyfreak
Copy link
Contributor Author

fluffyfreak commented Oct 15, 2024

oh yeah, terrain is definitely suffering z-buffer issues. Started new game as "New Hope"
image

@sturnclaw
Copy link
Member

It's not just terrain either... Look at the building models. Where the primitive order is unfavorable, they're rendering in reverse order.

I suspect some poking around in RenderDoc may be needed here - at minimum I suspect reverse-Z isn't working correctly, possibly even issues with floating point precision (i.e. mediump/lowp) selected by the driver.

@fluffyfreak
Copy link
Contributor Author

Good eye 👀 even playing it I didn't spot that!
Also I didn't know that RenderDoc was available on ARM. Maybe I'll get time to look at this later but I've got 1.24 installed on the RPi5 now

@fluffyfreak
Copy link
Contributor Author

fluffyfreak commented Oct 16, 2024

[16:16:49.756079] Info: Type: Error, Source: API, ID: 1, Severity: High, Message: GL_INVALID_OPERATION in glClipControl
[16:16:49.756131] Info: Type: Error, Source: API, ID: 1, Severity: High, Message: GL_INVALID_ENUM in glEnable(GL_TEXTURE_CUBE_MAP_SEAMLESS)

No luck with RenderDoc so far, it can launch Pioneer but fails to capture anything. However I did remember that EnableGLDebug ini setting which produced the above disappointing results. Perhaps glClipControl is simply not supported. I'm trying to take a look at the Boradcom v3d / Mesa source code but that might take a while to figure out.

depth buffer float is done for v3d the RPI GPU apparently, but I can't find any support for glClipControl

GL_ARB_depth_buffer_float (Float depth buffers) DONE (v3d)

Not that it matters since the driver reports it as available to Glew, and even when I force it to use GL_NV_depth_buffer_float it still has rendering issues 🤷

@sturnclaw
Copy link
Member

sturnclaw commented Oct 16, 2024

[16:16:49.756131] Info: Type: Error, Source: API, ID: 1, Severity: High, Message: GL_INVALID_ENUM in glEnable(GL_TEXTURE_CUBE_MAP_SEAMLESS)

Seamless cube map textures are a GL3.2 thing - we need to require the GL_ARB_seamless_cube_map extension on RPi.
According to the Mesa features matrix that extension is not supported in the v3d driver.

[16:16:49.756079] Info: Type: Error, Source: API, ID: 1, Severity: High, Message: GL_INVALID_OPERATION in glClipControl

This is a little weird. The driver shouldn't be reporting support for GL_ARB_clip_control at all given the feature matrix. It's certainly not implementing the API call at all, or it would return GL_INVALID_ENUM instead of invalid-op.

Neither gpuinfo nor mesa's feature matrix report any support for GL_NV_depth_buffer_float, so I don't think we'll have any joy there either.

This is getting a bit complicated. Since we don't have glClipControl available, the depth buffer range is still near = -1, far = 1, but our rendering wants near = 1, far = 0. This is mostly fine even without GL_ZERO_TO_ONE, except for the fact that we don't have nearly enough bits of floating point precision.

The "fix" for this (and you can replicate the rendering issue on your PC just by commenting-out the glClipControl call) is to set the m_minZNear value to something absurdly large like 0.1f (mostly works with our cockpits) or 0.2f (will cause rendering artifacts when looking backwards at the seat). That gives enough bits of precision that terrain and building models look normal enough when close.

Terrain is still very messed up once in orbit, but it's not hugely apparent... at least until you set terrain detail to Low, at which point the seams are quite visible. Jupiter's rings are still borked, since at that distance both they and Jupiter map to a single depth value.

Given the Powercore VII is primarily an OpenGL ES chip meant for mobile devices, this is probably the best we're getting unless you know someone on the Mesa team who could look at implementing ARB_clip_control (or make a bug report) - the chip supports Vulkan, so it probably can handle zero-to-one clip coordinates.

@fluffyfreak
Copy link
Contributor Author

Yeah that "works" given the precision limitations. There is one other option which you may find distateful given that you removed it in the past, but I could reimplement the logz hack for these Arm / OpenGL ES boards.

I did also look into the option of implementing glClipControl for Mesa V3D, after digging into the code around this commit I found that the defines seem to implement NONE, MIN_ONE_TO_ONE or ZERO_TO_ONE \mesa\src\broadcom\cle\v3d_packet.xml.

So implementing it might be possible 👍 though I've never submitted/committed anything to Mesa before

@fluffyfreak
Copy link
Contributor Author

Also with a working Z-depth test the performance was much improved, from ~20fps to ~30fps in the same save game file I used. That was at medium detail for planet and city as well.
There's definitely scope for larger performance improvements. These little SBC boards are seriously capable machines now!

@fluffyfreak
Copy link
Contributor Author

fluffyfreak commented Oct 19, 2024

In fact in that v3dx_emit.c I think the parameter to decide would be clip_halfz which has this comment:

/**
* When true clip space in the z axis goes from [0..1] (D3D). When false
* [-1, 1] (GL).
*
* NOTE: D3D will always use depth clamping.
*/
unsigned clip_halfz:1;

So it needs the glClipControl to set that, and then the check would have to change to something like:

config.z_clipping_mode = v3d->rasterizer->base.depth_clip_near ||
v3d->rasterizer->base.depth_clip_far ?
v3d->rasterizer->base.clip_halfz ? V3D_Z_CLIP_MODE_ZERO_TO_ONE : V3D_Z_CLIP_MODE_MIN_ONE_TO_ONE : V3D_Z_CLIP_MODE_NONE;

Which would have to be better written to get it submitted but you can see the logic...

Ok I'm not going to have much more time tonight to look into this but it looks like most of the elements are there to support glClipControl.

@sturnclaw
Copy link
Member

There is one other option which you may find distateful given that you removed it in the past, but I could reimplement the logz hack for these Arm / OpenGL ES boards.

The problem with reimplementing logz is...

Also with a working Z-depth test the performance was much improved, from ~20fps to ~30fps in the same save game file I used.

You lose any kind of early-Z depth test, including the implicit one done when binning triangles in hardware. While part of the performance gain you're seeing here is likely coming from the tile-out bandwidth of writing a pixel back to main memory, doing depth-write in a fragment shader could in the worst case potentially halve (or worse) the framerate as the hardware renderer is no longer able to guarantee that fully occluded fragments (according to the vertex data) will still be occluded after the fragment shader runs.

Whether the VideoCore VII has a hidden-surface-removal pipeline or not, you'll likely see a loss of performance from reimplementing log-Z. Given performance on this hardware is already tenuous, that's probably not a good idea.

Ok I'm not going to have much more time tonight to look into this but it looks like most of the elements are there to support glClipControl.

If the hardware supports Vulkan / DirectX depth semantics, it should be able to implement at least the zero-to-one clip-space part of glClipControl - certainly enough for our needs. This would ideally be the way to go.

@sturnclaw
Copy link
Member

#5970 may produce a slight FPS increase on the Raspberry Pi, as it allows a few extra milliseconds of work to be overlapped with the queued GPU commands from the previous frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants