Support for GL_EXT_shader_explicit_arithmetic_types and persistently mapped memory #25

mpottinger · 2021-08-09T14:34:20Z

Hello, first I would like to thank you for this wonderful library. It is nearly perfect, I was looking for a Python alternative to the C++ vuh library https://github.com/Glavnokoman/vuh and this is nearly identical for my use case.

There are just two things that would make it perfect.

One is explicit sized data types. int8, int16, uint8, uint16, etc. for sending bytes and shorts. This is especially useful for image processing as most of the data is in 24 bit RGB, or 16 bit depth maps, etc. It looks like this would be a relatively small modification and I could probably add it if you accept a pull request.

The other thing I've noticed is that BufferCPU works very fast on my M1 Mac and on my Intel Integrated GPU, to the point where it is faster than anything I can get on my Nvidia RTX 2080ti. I tried StagedBuffer, BufferCPU on my desktop but the data transfer is a bottleneck.

I remember in C++ with vuh that was not the case and I didn't have this performance discrepancy, I was able to use Host Only memory on Nvidia and get similar performance as on the integrated GPUs, but right now only the integrated GPUs are offering the performance that I want, with lots of data going in/out on every frame. Is there a possible solution to this? Allow access to Host Only (shared) memory on discrete GPU?

osanj · 2021-08-10T22:09:40Z

Hey @mpottinger, thanks for the kind words. I still use the library for a path tracing application, however I will need some time to get back into the depths of it and I am not sure how quickly I can help you, any help is appreciated 🙂

One is explicit sized data types. int8, int16, uint8, uint16, etc. for sending bytes and shorts. This is especially useful for image processing as most of the data is in 24 bit RGB, or 16 bit depth maps, etc. It looks like this would be a relatively small modification and I could probably add it if you accept a pull request.

Sure, if you can open a PR that would be great! I wanted to suggest a pragmatic solution using ints and then bitwise operations in the shader to pick the byte you want for each channel, but my first superficial googling seems to indicate that GLSL does not support bitwise operations.

Is there a possible solution to this? Allow access to Host Only (shared) memory on discrete GPU?

I had the same experience, transfer between host and gpu seems to vary wildly depending on the hardware. If you have a simple benchmark script (without your actual code, just the transmission part and some dummy computation), I can try to reproduce it on my setup (ubuntu something and Nvidia GTX1080) and do some experiments from there. I will try to look into your suggestion, if you have any more concrete tips let me know. Anyway, I am not sure when I will be able to take a look, maybe only on the weekend.

mpottinger · 2021-08-13T14:35:15Z

Hey @mpottinger, thanks for the kind words. I still use the library for a path tracing application, however I will need some time to get back into the depths of it and I am not sure how quickly I can help you, any help is appreciated

Sure, if you can open a PR that would be great! I wanted to suggest a pragmatic solution using ints and then bitwise operations in the shader to pick the byte you want for each channel, but my first superficial googling seems to indicate that GLSL does not support bitwise operations.

Ok great. I am doing this in my spare time, but it would be really useful for me so I'll give it a try fairly soon. It seems like there is not much to it really, just need to recognise the data types within the part of your code that assigns it based on the SPIRV, and match with the corresponding numpy data type. I had already explored a bit and know where it checks the SPIRV code, and was able to see the codes for uint8_t, etc that weren't handled. Just didn't get to modifying it to the point where it uses it fully.

It may be hardware/driver dependent but I am also able to use the bitwise operations in glsl no problem, I just sometimes find it a cleaner solution to be able to use the uint8_t bytes, uint16_t, etc. I also found using ndarray.view() feature is a workaround in some cases where I have a 4 channel RGBA image and just need to pass it as int32 and unpack in the shader.

I had the same experience, transfer between host and gpu seems to vary wildly depending on the hardware. If you have a simple benchmark script (without your actual code, just the transmission part and some dummy computation), I can try to reproduce it on my setup (ubuntu something and Nvidia GTX1080) and do some experiments from there. I will try to look into your suggestion, if you have any more concrete tips let me know. Anyway, I am not sure when I will be able to take a look, maybe only on the weekend.

Yes this may be totally hardware dependent. I may not have done a fair comparison between using lava in Python and my C++ code. Different resolutions, etc. In vuh/C++ I was using vuh::mem::Host, which I think is the same as BufferCPU in your library, so there may be nothing to solve and I'll just have to enjoy the higher performance on the Apple Silicon platform.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for GL_EXT_shader_explicit_arithmetic_types and persistently mapped memory #25

Support for GL_EXT_shader_explicit_arithmetic_types and persistently mapped memory #25

mpottinger commented Aug 9, 2021

osanj commented Aug 10, 2021 •

edited

Loading

mpottinger commented Aug 13, 2021

Support for GL_EXT_shader_explicit_arithmetic_types and persistently mapped memory #25

Support for GL_EXT_shader_explicit_arithmetic_types and persistently mapped memory #25

Comments

mpottinger commented Aug 9, 2021

osanj commented Aug 10, 2021 • edited Loading

mpottinger commented Aug 13, 2021

osanj commented Aug 10, 2021 •

edited

Loading