-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
primus is PCIe bottlenecked #176
Comments
not really, the whole point here is that you have to download content of framebuffer into system memory (this costs bandwidth) and then upload it onto display/integrated gpu. It's theoretically possible to use one of opengl compressions to reduce memory bandwidth overhead, but that would bottleneck GPU itself. DRI_PRIME has advantage of GEM and DMA_BUF sharing, they can skip extra copy. Or at least this is my understanding of it, haven't seen that code in months. |
yeah, compression should make a difference, but while I was doing some pcie speed work for nouveau, I noticed even 5% speed ups in 20 fps full hd scenarios, just by going from 2.5 to 8.0 pcie speed. And this speed up increases the higher the transfered pixels go. And then there are games like the talos principle. which just got a 25% perf boost here. |
Anyways, I did some micro optimizations on my branch, but core code is fairly simple and straight forward and I doubt there is much we can do to speed things up without sacrificing performance or introducing heavy input lag. Newer nvidia cards (fermi-kepler) have asynchronous copy engines that can make the whole buffer copy thing asynchronous for both GPU and CPU, but I haven't tested whether it works properly with primus (it might). Kepler card actually have two copy engines, which allows concurrent copies at the same time (so you can load textures in your favourite game and copy primus framebuffer at the same time). There is some interesting read about it here: http://www.nvidia.com/docs/IO/40049/Dual_copy_engines.pdf |
yeah, it would be nice to make the copy less demanding on the application. |
@karolherbst yep, but I doubt that this is the real problem, it's sad that we are limited to 300 fps in glxgears, but the it's hardly an issue in games where this really isn't the bottleneck. Code itself caused tons of cache misses, but in grand scheme of things this is negligible. Quite frankly, the optimal solution would be to ignore dmabuf's GPLv3 license and NVIDIA's license and code propper buffer sharing into opensourced part of nvidia's driver (their driver wrapper that deals with kernel has available code, no idea about licensing tho). |
no, I mean there should be performance improvements even if the pcie bus isn't at full load, or when the game is running around 20 fps. I know that +5% isn't that much, but maybe there is a cheap way to reduce the overhead a bit. |
Wouldn't it be possible to use the Intel userptr API to bind the PBO directly to the drawable? That way only synchronisation and resizing needs to be handled during an active context as the drawable would have the PBO content zero-copy. I've been looking at how to do this, but maybe I'm missing something that would make this unworkable? |
A good way to optimize iGPU upload would be by extending PBO texture upload path (that uses userptr if I recall correctly) to PBO DrawPixels as outlined in comments 5-7 in this bug: https://bugs.freedesktop.org/show_bug.cgi?id=77412. But normally the bottleneck is on dGPU download side, so this wouldn't help frame rates (but should help with power consumption). |
This could be mitigated with nvidia capture SDK (NVFBC) on newer Nvidia cards, but I have no idea how licensing would work unfortunatedly. |
@tpruzina isn't this for recording from screen when it is driving by the nvidia GPU? I don't see how this can help here? |
after doing some tests with nouvean and DRI_PRIME, I noticed that compared to primus, I got around x4 times fps in pcie limited scenrios (like glxspheres).
Is there a possibility to reduce the pcie load under primus?
The text was updated successfully, but these errors were encountered: