Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Multipass shaders (buffer A ...) #30

Open
wants to merge 59 commits into
base: main
Choose a base branch
from
Open

Conversation

Vipitis
Copy link
Collaborator

@Vipitis Vipitis commented May 9, 2024

part of #4

approximately 17.5% of public Shadertoys are multipass. Multipass allows up to 4 buffers (A through D) to be rendered as a texture. These can also be used to store data and enable quite some more experiences.

Some of the challenges include timing as well as cross inputs.
Buffer passes can seemingly take the exact same inputs as the main "Image" renderpass, including other buffers (and themselves?)

This PR starts to bloat a little and contains some refactor for the whole channel input concept... still in flux

Instead, will try to implement BufferTexture as a ShadertoyChannel subclass so it can hold for example the sampler settings.
Additionally, there will likely be a RenderPass base class and subclasses for Image, Buffer(a-d) and later cube and sound.
So the main Shadertoy class contains several render passes, and all of these get their inputs(channels) attached.
I even started to try and sketch it out - but will have to sleep through this for a few more days... my concepts change every day but I need to just try and work on the ideas for a bit.
image

The render order should be Buffer A through D and then Image. So you can keep temporal data, by using itself has an input.

TODOs:

  • Refactor common code to ShadertoyChannel base class
  • additional tests cases for inferred input types, empty channels (caching conflict with pytests)
  • dynamic headers
  • test coverage for examples in readme!(different PR)
  • Working buffer
  • working multiple buffers!
  • tests for buffers
  • multipass examples
  • (maybe) some debug mode where you can render the buffers to canvas? (you can use RenderDoc with "capture child processes")

@Vipitis
Copy link
Collaborator Author

Vipitis commented May 24, 2024

Little update:
It seems to sorta work. But a bunch of stuff is still broken:

  • missing sampler filters fail to recreate some behavior, example: https://www.shadertoy.com/view/MfyXzV
  • having unsupported channels break the layout, example: https://www.shadertoy.com/view/ssjyWc (layout fixed by detecting channels in the common code, shader still broken, fixed in float32)
  • resizing the window breaks the nbytes of the previous frame - so maybe there needs to be a hook to the on_resize function to fill the new space (similar to behaviour when going fullscreen on the website)

new breaking examples found, that might be unrelated to this PR, but I will note them down for later reference:

@Vipitis
Copy link
Collaborator Author

Vipitis commented Jun 7, 2024

I think I finally fixed the compatibility issue. There is some small visual issues which look like precision problems to me (not sure yet). And the performance is horrible it seems... Please let me know if you find any shaders that are broken (not due to missing features, wgpu bugs)

Will work on tests, examples and documentation to hopefully get this ready for next week.

E: found this one seemingly broken: >wgpu-shadertoy https://www.shadertoy.com/view/tsKXR3
detailed example of precision of this alpha channel is different: https://www.shadertoy.com/view/wsjSDt

@Vipitis Vipitis marked this pull request as ready for review June 21, 2024 19:13
@Vipitis
Copy link
Collaborator Author

Vipitis commented Jun 21, 2024

I think this is finally ready for review - and I welcome some feedback.

  • This PR refactors the whole inputs/channels to be easily extendable with the missing channel types.
  • Multipass shaders are quite complex, but I learned quite a lot to get this running, but I think my implementation is what is minimally required for wgpu (with the redoing the sampler and pipeline).
  • resizing is really janky, since you download the texture and pad it with numpy only to upload it again - but I wanted to mirror the website (this includes breaking quite a few shaders)

@Vipitis Vipitis changed the title [WIP] Support Multipass shaders (buffer A ...) Support Multipass shaders (buffer A ...) Jun 24, 2024
@hmaarrfk
Copy link
Contributor

Cool stuff!

@Vipitis
Copy link
Collaborator Author

Vipitis commented Aug 21, 2024

No worries, this sorta a large rewrite. Perhaps others can help too, time permitting.

Let me know if there are any specific parts of the diff I should focus my attention on first.

The part I am most unsure about is _update_textures
Since it feels really inefficient to make a new texture for every single frame. That includes a new binding and sampler too. I tried to use TextureView instead which worked much better, but I couldn't get it to work when the Buffer passes also sample buffer inputs. I feel like I am missing something.
The added overhead makes the example from the API tests run at like 45fps for example.

@Korijn
Copy link
Contributor

Korijn commented Sep 2, 2024

No worries, this sorta a large rewrite. Perhaps others can help too, time permitting.

Let me know if there are any specific parts of the diff I should focus my attention on first.

The part I am most unsure about is _update_textures Since it feels really inefficient to make a new texture for every single frame. That includes a new binding and sampler too. I tried to use TextureView instead which worked much better, but I couldn't get it to work when the Buffer passes also sample buffer inputs. I feel like I am missing something. The added overhead makes the example from the API tests run at like 45fps for example.

Maybe @almarklein can weigh in on that issue?

@almarklein
Copy link
Member

Is _update_textures called every frame? It seems to rebuild everything from scratch, from descriptors all the way to the pipeline. I don't have a clear understanding of what happens and the path that leads up to update_textures. But Ideally you want to re-use the textures. If that's not possible, you can probably at least re-use the layouts.

@Vipitis
Copy link
Collaborator Author

Vipitis commented Sep 4, 2024

Is _update_textures called every frame?

yeah, it will be called for each renderpass, every frame and then also iterate through all channels... I think this whole method doesn't actually need to exist.
I will try to make some changes that simply use a texture view for all the channels. And then the buffer renders to a temporary render target texture before overwriting the old texture. This was likely the cause for usage conflicts I had on the previous attempt. As it's common to have the previous frame as one of the inputs.

Works already well, but I will try to run some more examples before I push the commits. And it will likely break resizing for which there are no tests in CI. (but resizing in this PR is horrible too).

@Vipitis
Copy link
Collaborator Author

Vipitis commented Sep 23, 2024

Resizing now works again as it should. It's not the cleanest solution but it works.
I do feel like the performance is really bad again, but I need some proper ways to test that, as I am also using wgpu-py@main currently...

The two CI failures are sorta unrelated, one is deprecated actions and the other is python3.8 not happy with the excessive typing. Python 3.8 will be EOL in a few days - so I no longer care.
Will look at the PRs in wgpu-py that updated all the CI stuff and open something separate tomorrow.

Comment on lines 245 to 249
device=self._device, format=wgpu.TextureFormat.bgra8unorm
device=self._device, format=self._format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also set to None to have it select the preferred format. Less code, unless you need self._format. Note that the format is also accessible in texture.format on the texture obtained via get_current_texture().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I had a look and the only other solution I can think of is to make it a property of the ImageRenderPass, as it's needed to create the render pipeline. The awful part is that at init time for the Image class, the canvas(_present_context) might not be accessible via the parent instance of Shadertoy.
It could be returned by this method instead of passed via an attribute. It could be useful to have available when translating the snapshot back into RGB.

@Vipitis
Copy link
Collaborator Author

Vipitis commented Oct 31, 2024

My thesis is nearly submitted, so I can spent time again getting this finished.

For the performance issue I want to try the following: ping pong the render target to reduce the number of copies (which are likely the slowdown).

Also it might be the case that buffers don't followed the A-B-C-D order but instead the order in which they were added. Which I believe might be recorded in the json we get from the API. I will try to find some examples that proof the behavior.

E: maybe this can be merged as a MVP and performance improvements can be made in follow up PRs instead

@Vipitis
Copy link
Collaborator Author

Vipitis commented Nov 21, 2024

CI failures are addressed in #36 which isn't included in this branch

@Vipitis
Copy link
Collaborator Author

Vipitis commented Dec 10, 2024

found some more time to figure out the performance issues. I read somewhere that the queue is smart and might figure out of some copies and be avoided. Doesn't seem to make much difference. To avoid the guesswork I added some rudimentary profiling which still needs some work. But looking at the data already shows that something is up.

image

the slower rendertimes on the right are full screen (instead of small window). But the constant spikes are a problem - will find the time to run more experiments in the following days and try different systems (to rule out weirdness of Intel GPUs). Seems like something(I suspect the copies) is causing the GPU to hang for ~60m.

@Vipitis
Copy link
Collaborator Author

Vipitis commented Dec 20, 2024

while making the vertex and fragment code templates simpler - I still need to do a yflip for the image pass. I couldn't find any setup for the gui canvas to accepted a flipped image. This works mostly fine, however I have been informed of dFdy which will behave different due to this flip. Here is an example showing the problem.
Code_-_Insiders_p7TvHvu1UI: it should be the same red version across the whole canvas, but only the part reading from buffer texture is, the image pass (which is flipped in the vertex stage) doesn't behave.

Perhaps there is a solution to this down the line in rendercanvas? I would love to migrate to that once this PR gets merged.

@almarklein
Copy link
Member

Perhaps there is a solution to this down the line in rendercanvas?

No, the definition of the viewport is simply part of the wgpu spec (https://wgpu-py.readthedocs.io/en/stable/guide.html#coordinate-system). IIRC this is reversed from OpenGL.

however I have been informed of dFdy which will behave different due to this flip.

Aha, I gues that one will be multiplied with -1 ... 🤔 can we tell users to use a custom dFdy that we provide?

@Vipitis
Copy link
Collaborator Author

Vipitis commented Dec 20, 2024

can we tell users to use a custom dFdy that we provide?

Rewrite rules are one option, but can be really complicated, so there might be better ideas. Doing a whole renderpass or compute pass just to flip the out image also seems wasteful. Plus there might be other functionality that is currently broken which I don't know about yet.

I will track this as a standalone issue as it already exists before this PR too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants