-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where does the OPAQUE mmal buffer data pointer actually point to? #691
Comments
https://www.raspberrypi.org/forums/viewtopic.php?t=53698&p=413535 Seems like my assumption was correct. Anyway, if someone stumples over this and has an idea how to achieve what I am planning to do, please let me know. |
@doctorseus Sorry to bother you but have you ever found a solution? |
@Seneral that was some time ago but yes I actually was able to progress on this with some help of @6by9, a Engineer on the Raspberry Pi team. You can find the thread with relevant details on the physical buffer address here: https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=167652 I have a repo where I documented my progress and I actually managed to run a custom shader in real-time. The repo was private but I set it to public just now in case you can get something useful out of it: https://github.com/doctorseus/rpi-qpucamera-sandbox Edit: All of this is quite complex, I would assume you will need to have some good understanding of all involved topics to be able to follow along. Please note that writing these shaders by hand is really time-intensive and with the progress of the open source GPU driver, I believe now default on the Rapsberry Pi 4, it should be possible to run any OpenCL code which would make all this much easier. I didn't try it myself but I would guess nowadays it would be better to explore into that direction instead of the approach I took here. Or at least combine their shader compiler and the raw execution path I took. Let me know if you are able to get something going. |
Thank you so much, just read through the thread and that was exactly the missing puzzle piece I needed. Seems I'm pretty much in the same boat as you, want the absolute lowest latency with optimized code and don't mind diving into all this. Need it for three separate projects so much to gain for me. Oh in regards to OpenCL; Just like OpenCV I fear abstractions will keep me from actually making use of all the optimizations possible with the QPUs. Just from looking at the videocore IV documentation once there's so much I just can't see a compiler making use of compared to custom assembly code. |
Thanks, that code is a goldmine. Pretty much everything I've taken a look at from hello_fft seperated out for custom use in a nice package. Just saved me (and hopefully others) a ton of effort. It seems your QPU program is categorized as a general purpose program and is iterating over the whole frame, meaning one QPU does the whole frame. From my research I'd have imagined having to set up the V3D pipeline using control lists to do the frame tiling so the scheduler can set the QPUs to work on each tile automatically. I did not fully research this yet though, only read through the VC IV docs once - any reason you did not follow this path or did you simply tried this first and then stopped? The thing is even then the 720p60 is an impressive result. With a 2-pass shader (tbf doing a lot of fetching in a 5x5 kernel) for blob detection on OpenGLES I only managed 640p45/720p20 so even with this I'm already VERY happy with the results. |
At the moment I don't even know what each of the shaders was used for in the repo and which one worked (maybe you can get it working), so this is all from memory: The rest of the V3D pipeline might not be of use, these are helpful mechanisms for handling/scheduling framebuffers etc, everything you need when you want to implement an OpenGL driver basically. |
Ok yeah so after some more research, yeah you're rught the V3D pipeline is not too useful if you don't need geometry processing stuff. It's way easier and faster to do the tiling of the data yourself. I'm going to use direct register access as documented instead of the Mailbox interface for that, gives much more control. Also, while you are right that you do 16 values at once, you are actually only using one out of 12 QPUs. So all in all if I am correct, the fact you still got 720p60 out of it is VERY impressive. |
Yes right, this is all correct (also re-read some of the documentation just now). And I believe that was also the goal. I also just realized that the changed the GPU implementation with the latest Raspberry Pi which is definitely a pain for all this. But I read articles which mentioned that at least mesa should have got the patches to allow targeting the new VideoCore VI (but still no open documentation). Well I have no reason to believe it wasn't the case. It is true that I just moved the output buffer further down the line to the h264 encoder for visualization but as far as I can remember there was no backlog so it appeared it was fast enough (even with format conversion). Edit: Also the parameter "1" in execute_qpu says how many QPUs to use. ;-) |
Wait so you send it directly to the h264 encoder? Fastpath to encoder that was impossible on GL? Right makes sense, not a bad idea. |
In case you're interested, I've released my code, basically continuing were you left off. |
Hey, just had a look at the code, it is pretty cool that you got all that working and decided to stick with it! I don't have the HW right now to try it but did you get a bit more info on what platforms this can be used at the moment? Are there breaking changes with the new VideoCore / RPi4? I assume so as you stated in your documentation that it targets "VideoCore IV". Edit: Also big thanks for sharing your progress! |
BCM2835 is VideoCore IV, so that doesn't tell you anything about the code, but I'd be very surprised if it was Pi 4-specific. |
Not sure what you reference here. RPi4 uses a BCM2711 with an VideoCore VI and apparently there are additional mesa patches which lets me assume there are changes to the underlying GPU architecture which also makes it likely (but not certain) that the code discussed here will not work properly. |
"VideoCore IV" != "VideoCore VI". If the documentation says VCIV then it's not aimed at 2711. |
Ah I get where you are coming from (after re-reading my comment). As it's a pretty big commitment writing code for a platform which is EOL (in some way) I assumed that he tried it on both in the past two months, the VC IV and VC VI but concluded that it only will work on VC IV so decided to only mention that in the documentation. Either that, OR he didn't try it. Hence my question. |
Oh, I explicitly don't support VideoCore VI because I only plan to target VideoCore IV. What you call EOL is for me the only reason I even do this - the low price, power requirements and size of the RPi Zero. If you're ok with a 35€-50€ and big SBC there are probably easier solutions out there, even outside Raspberry Pis, like the Tinkerboards. Tried to explicitly point that out - the only reason to put in so much effort is if you really need that extra bit of performance out of the Zero specifically. The VideoCore VI could be supported in the future though, once there's an assembly compiler for it. It works slightly differently, has some more options (TMU can write now - not sure how useful yet, but may greatly increase write speed if all QPUs write simultaneously, who knows). There's already py-videocore6 made by the geniuses at Idein (they did a ton of cool stuff with the VideoCore IV already). But as far as I know not a normal assembly compiler, which I would greatly prefer. But IMO the value you get from using the QPU on the RPi4 is most probably lower than the value you get by doing it on the RPi Zero. RPi 2-3 is even lower value since they barely offer an advantage over the RPi Zero in that regard (only using QPU), and you could just use the more powerful RPi 4 for better results, provided VideoCore VI is well supported by then. |
Apart from the very first Pi model with the small GPIO header, none of the Raspberry Pi range has been EOL'ed. You can still buy them, all new. |
Yeah I think he meant virtually replaced by newer, more powerful models. Although unless they intend on bringing out a VideoCore VI based board in the Zero form factor AND price range, the Zero will always stay relevant to some degree. |
@Seneral well I understand what you mean, the most benefit for all this is for sure on the older platforms. I specifically targeted the Zero too when I started the project in 2016 and it still has an very attractive price point if you have a use-case. It's a shame that we don't have the documentation for the VC VI. "EOL (in some way)" -> going forward we will not see a new board with an VC IV. But it's for sure nice that we still have a good availability for older models. Anyway, nice work, I will keep an eye on the repo and will hopefully have some time to give it a try. |
Thanks! I mean I would really want a VideoCore VI based board in the Zero form factor. |
Btw here's a good writeup of some early findings on the VideoCore VI (haven't researched a lot further tbh). Seems it has a ton of new hurdles when it comes to GPGPU |
I am trying to obtain the physical memory pointer to the buffers used to store the camera frames. These buffers should be located in the GPU memory space as far as I know. But is there a way to obtain the physical address of these buffers?
Right now the values stored in the data field of the a MMAL_BUFFER_HEADER_T object returned by a (camera) component using MMAL_ENCODING_OPAQUE are looking like as if there is another layer running on the closed source firmware managing these buffers. And everything I can get on the ARM side is a value which can be associated with a certain buffer, but only by the firmware running on the VPU. Is this correct? Is there any information available about this?
Big picture: I want to run a compute kernel on the QPU which uses the raw data of the camera frame. It looks like I would be able to do that as long as I am able to get the physical pointer to the camera frame first. (Like the fastpath (GL_OES_EGL_image_external) for OpenGL but without limiting myself to OpenGL shaders only.)
For example this is such a buffer used later on for the OpenGL fastpath:
https://github.com/raspberrypi/userland/blob/bb15afe33b313fe045d52277a78653d288e04f67/host_applications/linux/apps/raspicam/RaspiTex.c#L447
The text was updated successfully, but these errors were encountered: