-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Multipass shaders (buffer A ...) #30
base: main
Are you sure you want to change the base?
Conversation
Little update:
new breaking examples found, that might be unrelated to this PR, but I will note them down for later reference:
|
I think I finally fixed the compatibility issue. There is some small visual issues which look like precision problems to me (not sure yet). And the performance is horrible it seems... Please let me know if you find any shaders that are broken (not due to missing features, wgpu bugs) Will work on tests, examples and documentation to hopefully get this ready for next week. E: found this one seemingly broken: |
I think this is finally ready for review - and I welcome some feedback.
|
Cool stuff! |
No worries, this sorta a large rewrite. Perhaps others can help too, time permitting.
The part I am most unsure about is |
Maybe @almarklein can weigh in on that issue? |
Is |
yeah, it will be called for each renderpass, every frame and then also iterate through all channels... I think this whole method doesn't actually need to exist. Works already well, but I will try to run some more examples before I push the commits. And it will likely break resizing for which there are no tests in CI. (but resizing in this PR is horrible too). |
Resizing now works again as it should. It's not the cleanest solution but it works. The two CI failures are sorta unrelated, one is deprecated actions and the other is python3.8 not happy with the excessive typing. Python 3.8 will be EOL in a few days - so I no longer care. |
wgpu_shadertoy/shadertoy.py
Outdated
device=self._device, format=wgpu.TextureFormat.bgra8unorm | ||
device=self._device, format=self._format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also set to None
to have it select the preferred format. Less code, unless you need self._format
. Note that the format is also accessible in texture.format
on the texture obtained via get_current_texture()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I had a look and the only other solution I can think of is to make it a property of the ImageRenderPass
, as it's needed to create the render pipeline. The awful part is that at init time for the Image class, the canvas(_present_context
) might not be accessible via the parent instance of Shadertoy
.
It could be returned by this method instead of passed via an attribute. It could be useful to have available when translating the snapshot back into RGB.
My thesis is nearly submitted, so I can spent time again getting this finished. For the performance issue I want to try the following: ping pong the render target to reduce the number of copies (which are likely the slowdown). Also it might be the case that buffers don't followed the A-B-C-D order but instead the order in which they were added. Which I believe might be recorded in the json we get from the API. I will try to find some examples that proof the behavior. E: maybe this can be merged as a MVP and performance improvements can be made in follow up PRs instead |
CI failures are addressed in #36 which isn't included in this branch |
found some more time to figure out the performance issues. I read somewhere that the queue is smart and might figure out of some copies and be avoided. Doesn't seem to make much difference. To avoid the guesswork I added some rudimentary profiling which still needs some work. But looking at the data already shows that something is up. the slower rendertimes on the right are full screen (instead of small window). But the constant spikes are a problem - will find the time to run more experiments in the following days and try different systems (to rule out weirdness of Intel GPUs). Seems like something(I suspect the copies) is causing the GPU to hang for ~60m. |
No, the definition of the viewport is simply part of the wgpu spec (https://wgpu-py.readthedocs.io/en/stable/guide.html#coordinate-system). IIRC this is reversed from OpenGL.
Aha, I gues that one will be multiplied with |
Rewrite rules are one option, but can be really complicated, so there might be better ideas. Doing a whole renderpass or compute pass just to flip the out image also seems wasteful. Plus there might be other functionality that is currently broken which I don't know about yet. I will track this as a standalone issue as it already exists before this PR too. |
part of #4
approximately 17.5% of public Shadertoys are multipass. Multipass allows up to 4 buffers (A through D) to be rendered as a texture. These can also be used to store data and enable quite some more experiences.
Some of the challenges include timing as well as cross inputs.
Buffer passes can seemingly take the exact same inputs as the main "Image" renderpass, including other buffers (and themselves?)
This PR starts to bloat a little and contains some refactor for the whole channel input concept... still in flux
Instead, will try to implement BufferTexture as a
ShadertoyChannel
subclass so it can hold for example the sampler settings.Additionally, there will likely be a RenderPass base class and subclasses for Image, Buffer(a-d) and later cube and sound.
So the main Shadertoy class contains several render passes, and all of these get their inputs(channels) attached.
I even started to try and sketch it out - but will have to sleep through this for a few more days... my concepts change every day but I need to just try and work on the ideas for a bit.
The render order should be Buffer A through D and then Image. So you can keep temporal data, by using itself has an input.
TODOs:
additional tests cases for inferred input types, empty channels(caching conflict with pytests)test coverage for examples in readme!(different PR)(maybe) some debug mode where you can render the buffers to canvas?(you can use RenderDoc with "capture child processes")