Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance on 2880x1800 display #714

Closed
Nephyrin opened this issue Dec 19, 2015 · 16 comments
Closed

Slow performance on 2880x1800 display #714

Nephyrin opened this issue Dec 19, 2015 · 16 comments

Comments

@Nephyrin
Copy link

Anecdotally, when trying to play games with primusrun, the FPS seems to 'surge' -- it will be smooth briefly, then laggy, and cycle. No PRIMUS_UPLOAD/SYNC/SLEEP/vblank_mode setting seems to majorly affect this, just slight changes in baseline FPS. Using glxgears doesn't really reproduce this, but does show very low FPS.

With mesa 11.0.7, nvidia 358.16, compositing window manager disabled or enabled (little difference) and these cards:

00:02.0 VGA compatible controller [0300]: Intel Corporation Crystal Well Integrated Graphics Controller [8086:0d26] (rev 08)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK107M [GeForce GT 750M Mac Edition] [10de:0fe9] (rev a1)

Primusrun (the FPS spike is the delay in me hitting my hotkey to make the window full screen):

$ vblank_mode=0 PRIMUS_VERBOSE=2 primusrun glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
ATTENTION: default value of option vblank_mode overridden by environment.
ATTENTION: default value of option vblank_mode overridden by environment.
ATTENTION: default value of option vblank_mode overridden by environment.
primus: profiling: upload autodetection: will use PBO path (170 iters)
primus: profiling: readback: 2880x1800, 442.8 fps, 2.2% app, 88.4% sleep, 9.3% map, 0.1% wait
primus: profiling: display: 2880x1800, 442.5 fps, 81.3% wait, 17.2% upload, 1.5% draw+swap
2218 frames in 5.0 seconds = 443.145 FPS
primus: profiling: readback: 2880x1800, 75.0 fps, 0.0% app, 90.1% sleep, 9.9% map, 0.0% wait
primus: profiling: display: 2880x1800, 75.0 fps, 73.4% wait, 26.2% upload, 0.4% draw+swap
375 frames in 5.0 seconds = 74.990 FPS
primus: profiling: readback: 2880x1800, 75.1 fps, 0.0% app, 90.1% sleep, 9.9% map, 0.0% wait
primus: profiling: display: 2880x1800, 75.1 fps, 73.3% wait, 26.3% upload, 0.4% draw+swap
376 frames in 5.0 seconds = 75.066 FPS
primus: profiling: readback: 2880x1800, 74.8 fps, 0.0% app, 90.1% sleep, 9.9% map, 0.0% wait
primus: profiling: display: 2880x1800, 74.8 fps, 73.5% wait, 26.1% upload, 0.4% draw+swap
374 frames in 5.0 seconds = 74.765 FPS
primus: profiling: readback: 2880x1800, 74.7 fps, 0.0% app, 90.1% sleep, 9.9% map, 0.0% wait
primus: profiling: display: 2880x1800, 74.7 fps, 73.9% wait, 25.7% upload, 0.4% draw+swap
374 frames in 5.0 seconds = 74.716 FPS
primus: profiling: readback: 2880x1800, 74.9 fps, 0.0% app, 90.0% sleep, 9.9% map, 0.0% wait
primus: profiling: display: 2880x1800, 74.9 fps, 73.3% wait, 26.2% upload, 0.4% draw+swap
375 frames in 5.0 seconds = 74.914 FPS

Straight intel card:

$ vblank_mode=0 glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
ATTENTION: default value of option vblank_mode overridden by environment.
7953 frames in 5.0 seconds = 1590.215 FPS
2946 frames in 5.0 seconds = 589.158 FPS
2866 frames in 5.0 seconds = 573.062 FPS
2548 frames in 5.0 seconds = 509.579 FPS
2517 frames in 5.0 seconds = 503.359 FPS

With optirun (again slight spike while I fullscreen window):

$ vblank_mode=0 VGL_PROFILE=1 optirun glxgears           
Blit        - 1250.59 Mpixels/sec- 1556.39 fps
Total       -  275.81 Mpixels/sec-  342.80 fps
Readback    -  317.42 Mpixels/sec-  391.93 fps
Blit        - 1400.06 Mpixels/sec-  270.07 fps
Total       -  307.20 Mpixels/sec-   59.26 fps
Readback    -  325.39 Mpixels/sec-   62.77 fps
869 frames in 5.0 seconds = 173.609 FPS
Blit        - 1492.97 Mpixels/sec-  288.00 fps
Total       -  323.15 Mpixels/sec-   62.34 fps
Readback    -  328.76 Mpixels/sec-   63.42 fps
Blit        - 1507.51 Mpixels/sec-  290.80 fps
Total       -  324.28 Mpixels/sec-   62.55 fps
Readback    -  329.93 Mpixels/sec-   63.64 fps
310 frames in 5.0 seconds = 61.836 FPS
Blit        - 1498.13 Mpixels/sec-  288.99 fps
Total       -  315.38 Mpixels/sec-   60.84 fps
Readback    -  326.83 Mpixels/sec-   63.05 fps
Blit        - 1477.49 Mpixels/sec-  285.01 fps
Total       -  322.58 Mpixels/sec-   62.23 fps
Readback    -  328.51 Mpixels/sec-   63.37 fps
Blit        - 1422.42 Mpixels/sec-  274.39 fps
Total       -  320.02 Mpixels/sec-   61.73 fps
Readback    -  325.75 Mpixels/sec-   62.84 fps
311 frames in 5.0 seconds = 62.048 FPS
Blit        - 1472.28 Mpixels/sec-  284.00 fps
Total       -  322.83 Mpixels/sec-   62.27 fps
Readback    -  328.37 Mpixels/sec-   63.34 fps

Tried to no effect:

  • Ensuring kwin compositing was disabled
  • PRIMUS_SLEEP values
  • PRIMUS_UPLOAD values
  • xorg.conf.nvidia Option "TripleBuffer" "true"
  • __GL_SYNC_TO_VBLANK values for the nvidia driver
  • Ensuring nvidia-settings has everything set to max-performance

Versions:

  • Arch x64
  • Linux 4.2.5
  • bumblebee 3.2.1
  • primus 20151110 (arch snapshot)
  • nvidia 358.16
  • mesa 11.0.7
@karolherbst
Copy link

check your pcie bandwith utilization in nvidia-settings

@karolherbst
Copy link

for me glxgears, fullscreen, fullhd: 67%

@karolherbst
Copy link

to be honest: you will get better performance with nouveau and prime offloading via DRI_PRIME here:

nvidia (optirun):
946 frames in 5.0 seconds = 189.139 FPS

nvidia (primus):
1092 frames in 5.0 seconds = 218.267 FPS

nouveau (prime):
3621 frames in 5.0 seconds = 724.097 FPS

@karolherbst
Copy link

I already opened a bug on the primus side: amonakov/primus#176

sadly vgl can't do really better here, because you can only improve performance with another compression used

@karolherbst
Copy link

@Nephyrin you could check if running stuff with "optirun -c jpeg" is better, because that reduces the bandwith needed by a lot

@karolherbst
Copy link

@Nephyrin ohh I just remeber, that those needed pcie patches for nouveau aren't mainlined yet, so you stuck with a slower bus anyway, but with full bus speed, this might give you better performance most likely.

@Nephyrin
Copy link
Author

Unfortunately I'm seeing poor performance across the board, not just a 70fps cap across all apps which would seem to be expected if PCIe was the bottleneck.

  • Using optirun -c jpeg on glxgears, I still get ~76fps, but only 70% PCIe utilization. Less GPU utilization.
  • Launching a steam game that is not very GPU intensive (Broforce) surges between 40 and 60fps. It gets solid 60FPS on the intel GPU.
  • Team Fortress 2 gets 20-40FPS, surging between the two on a cycle. Drastically different graphics settings doesn't seem to matter at all. The Intel GPU alone can get 30FPS. The nvidia GPU should be able to get >100fps standalone easily.

None of the above situations seem to max out GPU utilization or PCIe utilization in nvidia-settings. Using intel_gpu_top, the intel card also doesn't seem to be capped out anywhere.

I also just tried running in 1440x900 resolution -- I can hit 60FPS now in Team Fortress 2, but it still surges down to 40fps every few seconds very reliably.

So the glxgears results in the initial report might not be pointing to the exact issue here, but I'm not able to get good performance out of nearly any application, even when I'm at 1440x900, so I suspect there's a hidden bottleneck somewhere.

As for using nouveau -- the driver's performance is still drastically lower than the proprietary drivers, to the point that it is often slower than the intel driver anyway, even without bumblebee overhead.

@karolherbst
Copy link

pcie won't be ever maxed out, a value near 70% is already "too" high.

@karolherbst
Copy link

well with nouveau i get steady 70% perf compared to the blob, and my 770M is faster than my intel hd 4600 with nouveau. But it could be different for your 750M though.

@karolherbst
Copy link

but yeah, there seems to be another problem somewhere. Did you check if you also hit this low fps spikes with primusrun?

@ArchangeGabriel
Copy link
Member

No answer from OP, closing. Feel free to reopen if still having issue.

@JVAQUEROM
Copy link

I am having (I think) a similar issue here. Running openSuSE Leap 43.2 with plasma as DE. Laptop is Acer Aspire E5-575G 752.

Processor: i7-6599U Skylake
Graphics NVIDIA GeForce GTX 950M 2Gb (DDR5)
8 Gb RAM DDR4

> uname -r
4.4.126-48-default

I am having ups and downs in the framerate. In games it seems like it goes smooth for a brief time, then laggy.

Last check I did on glxspheres

VGL_READBACK=sync optirun -c yuv glxspheres 
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: GeForce GTX 950M/PCIe/SSE2
64.296799 frames/sec - 56.527174 Mpixels/sec
105.206014 frames/sec - 92.492919 Mpixels/sec
90.219756 frames/sec - 79.317601 Mpixels/sec
114.930866 frames/sec - 101.042620 Mpixels/sec
100.892932 frames/sec - 88.701030 Mpixels/sec
127.448799 frames/sec - 112.047886 Mpixels/sec
116.216227 frames/sec - 102.172658 Mpixels/sec
120.193549 frames/sec - 105.669360 Mpixels/sec
107.661753 frames/sec - 94.651906 Mpixels/sec
78.433858 frames/sec - 68.955910 Mpixels/sec
122.854527 frames/sec - 108.008786 Mpixels/sec
115.924293 frames/sec - 101.916001 Mpixels/sec
94.445980 frames/sec - 83.033128 Mpixels/sec
79.765504 frames/sec - 70.126640 Mpixels/sec
127.245028 frames/sec - 111.868739 Mpixels/sec
124.601965 frames/sec - 109.545064 Mpixels/sec
134.367660 frames/sec - 118.130672 Mpixels/sec
134.451884 frames/sec - 118.204719 Mpixels/sec
132.829921 frames/sec - 116.778753 Mpixels/sec
72.466655 frames/sec - 63.709784 Mpixels/sec
136.116264 frames/sec - 119.667974 Mpixels/sec
137.167082 frames/sec - 120.591811 Mpixels/sec
91.733217 frames/sec - 80.648175 Mpixels/sec
110.258941 frames/sec - 96.935251 Mpixels/sec
63.402515 frames/sec - 55.740955 Mpixels/sec
117.603838 frames/sec - 103.392590 Mpixels/sec
109.229641 frames/sec - 96.030331 Mpixels/sec
107.292261 frames/sec - 94.327064 Mpixels/sec
117.638807 frames/sec - 103.423334 Mpixels/sec
137.632793 frames/sec - 121.001246 Mpixels/sec
130.148958 frames/sec - 114.421758 Mpixels/sec
101.830828 frames/sec - 89.525590 Mpixels/sec
85.349530 frames/sec - 75.035893 Mpixels/sec
123.005477 frames/sec - 108.141495 Mpixels/sec
80.308873 frames/sec - 70.604349 Mpixels/sec
125.878048 frames/sec - 110.666944 Mpixels/sec
129.882637 frames/sec - 114.187619 Mpixels/sec
117.276620 frames/sec - 103.104913 Mpixels/sec
110.182847 frames/sec - 96.868351 Mpixels/sec
101.221516 frames/sec - 88.989908 Mpixels/sec
99.253342 frames/sec - 87.259568 Mpixels/sec
82.911729 frames/sec - 72.892676 Mpixels/sec
122.372917 frames/sec - 107.585374 Mpixels/sec
115.305227 frames/sec - 101.371743 Mpixels/sec
125.939182 frames/sec - 110.720691 Mpixels/sec
131.175790 frames/sec - 115.324508 Mpixels/sec
110.155795 frames/sec - 96.844568 Mpixels/sec
126.575652 frames/sec - 111.280250 Mpixels/sec
115.866269 frames/sec - 101.864989 Mpixels/sec
99.945849 frames/sec - 87.868392 Mpixels/sec
89.724930 frames/sec - 78.882569 Mpixels/sec
129.103775 frames/sec - 113.502875 Mpixels/sec
122.908899 frames/sec - 108.056588 Mpixels/sec
79.507489 frames/sec - 69.899804 Mpixels/sec
103.810598 frames/sec - 91.266125 Mpixels/sec
121.643188 frames/sec - 106.943826 Mpixels/sec
125.875424 frames/sec - 110.664638 Mpixels/sec
127.227349 frames/sec - 111.853196 Mpixels/sec
125.481721 frames/sec - 110.318509 Mpixels/sec
129.262466 frames/sec - 113.642389 Mpixels/sec
89.683373 frames/sec - 78.846034 Mpixels/sec
125.769694 frames/sec - 110.571684 Mpixels/sec
84.299457 frames/sec - 74.112710 Mpixels/sec
124.712074 frames/sec - 109.641867 Mpixels/sec
92.686218 frames/sec - 81.486015 Mpixels/sec
116.461030 frames/sec - 102.387879 Mpixels/sec
109.764445 frames/sec - 96.500509 Mpixels/sec
116.390486 frames/sec - 102.325860 Mpixels/sec
97.977902 frames/sec - 86.138252 Mpixels/sec
91.159015 frames/sec - 80.143360 Mpixels/sec
124.890483 frames/sec - 109.798717 Mpixels/sec
95.743766 frames/sec - 84.174089 Mpixels/sec
98.501250 frames/sec - 86.598359 Mpixels/sec
122.449669 frames/sec - 107.652851 Mpixels/sec
131.872727 frames/sec - 115.937227 Mpixels/sec
128.047759 frames/sec - 112.574468 Mpixels/sec
120.711166 frames/sec - 106.124429 Mpixels/sec
129.075534 frames/sec - 113.478047 Mpixels/sec
107.149223 frames/sec - 94.201311 Mpixels/sec
130.445978 frames/sec - 114.682886 Mpixels/sec
87.163559 frames/sec - 76.630715 Mpixels/sec
121.216200 frames/sec - 106.568434 Mpixels/sec
126.101354 frames/sec - 110.863267 Mpixels/sec
116.524241 frames/sec - 102.443452 Mpixels/sec
47.411311 frames/sec - 41.682128 Mpixels/sec
112.385479 frames/sec - 98.804818 Mpixels/sec
118.709220 frames/sec - 104.364398 Mpixels/sec
109.784661 frames/sec - 96.518282 Mpixels/sec
110.360599 frames/sec - 97.024624 Mpixels/sec
79.723696 frames/sec - 70.089884 Mpixels/sec
125.198370 frames/sec - 110.069399 Mpixels/sec
124.977131 frames/sec - 109.874894 Mpixels/sec
133.686049 frames/sec - 117.531426 Mpixels/sec
119.801488 frames/sec - 105.324676 Mpixels/sec
133.877527 frames/sec - 117.699767 Mpixels/sec
132.941585 frames/sec - 116.876924 Mpixels/sec
117.977125 frames/sec - 103.720769 Mpixels/sec
129.396435 frames/sec - 113.760170 Mpixels/sec
132.623297 frames/sec - 116.597098 Mpixels/sec
136.792690 frames/sec - 120.262661 Mpixels/sec
125.979591 frames/sec - 110.756217 Mpixels/sec
93.829569 frames/sec - 82.491204 Mpixels/sec
99.406079 frames/sec - 87.393849 Mpixels/sec
119.819983 frames/sec - 105.340936 Mpixels/sec
58.802128 frames/sec - 51.696479 Mpixels/sec
125.207959 frames/sec - 110.077829 Mpixels/sec
125.424882 frames/sec - 110.268539 Mpixels/sec
88.295529 frames/sec - 77.625897 Mpixels/sec
102.999651 frames/sec - 90.553173 Mpixels/sec
69.105451 frames/sec - 60.754748 Mpixels/sec
63.411089 frames/sec - 55.748493 Mpixels/sec
110.347147 frames/sec - 97.012798 Mpixels/sec
113.834472 frames/sec - 100.078714 Mpixels/sec
87.144146 frames/sec - 76.613647 Mpixels/sec
105.167808 frames/sec - 92.459330 Mpixels/sec
111.227494 frames/sec - 97.786763 Mpixels/sec
125.227946 frames/sec - 110.095401 Mpixels/sec
119.467151 frames/sec - 105.030740 Mpixels/sec
106.895330 frames/sec - 93.978098 Mpixels/sec
95.232521 frames/sec - 83.724623 Mpixels/sec

You can see some ups and downs. But it feels worse. With Intel card it looks smoother.

Unigine Haven has also been tested with some configurations (run with vblank_mode=0 optirun -b primus -c :8).

FPS: 28.6
Score: 719
Min FPS: 7.4
Max FPS: 65.3

Linux 4.4.126-48-default x86_64
Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz (2591MHz) x4
Unknown GPU (256MB) x1

Render: OpenGL
Mode: 1366x768 fullscreen
Preset: Custom
Quality: high
Tesselation: disabled

@karolherbst
Copy link

@JVAQUEROM actually this might be related to some changes in the intel driver doing more explicit synchronizations. This also happens for non bumblebee setups.

@JVAQUEROM
Copy link

JVAQUEROM commented May 1, 2018

I have to say, games work even better with the intel card than the nvidia...but I just tried running the game with primusrun and it goes suprisingly well!! So I guess I should always use primusrunto run my games?

EDIT: running F1 2015 from Steam have some problems too. Slight ralentizations and mainly sound gets messed up (slow, missing parts, etc.)

@karolherbst
Copy link

@JVAQUEROM yeah, primusrun has lower draw overhead usually and should work better in most cases.

@JVAQUEROM
Copy link

ok, so I should stick to primusrun and nothing else to be done, right?

Thank you for your answers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants