Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nouveau switching method #820

Closed
wants to merge 2 commits into from

Conversation

abbradar
Copy link

Credit to this idea goes to @dionorgua in Bumblebee-Project/bbswitch#78 (comment) -- we can use nouveau for power management and unload it to use proprietary nvidia drivers. The code is fairly simple and it seems to work out of the box. I'm not sure about priority of this method (do we want it by default? if we don't see bbswitch?).

@dionorgua
Copy link

Have you tried that it actually works?

Personally I don't use NVIDIA graphics at all.
Just tried that DRI_PRIME=1 works with nouveau and that's it.

@abbradar
Copy link
Author

Yes, it works -- glxinfo reports Intel, primusrun glxinfo -- NVIDIA (not nouveau). primusrun glxgears works fine and from dmesg nouveau properly stops the card. Overall this seems the best of both worlds to me.

@abbradar
Copy link
Author

(Oh, but DRI_PRIME won't work with this configuration because xf86-video-nouveau needs to be unloaded).

@Lekensteyn
Copy link
Member

Won't this break (fail to unload nouveau) once X is started? Consider this order:

  • bumblebeed starts
  • nouveau is loaded by bumblebeed
  • display manager/Xorg is started
  • cannot unload nouveau anymore.

@abbradar
Copy link
Author

It would fail, if X has xf86-video-nouveau driver installed and is allowed to load it. This can be mitigated by either (1) removing the driver (which I've chosen) or (2) explicitly listing allowed drivers in Device sections (my memory is hazy on this but I think it can be done).

@Lekensteyn
Copy link
Member

The same problem will exist with the modesetting driver btw. It will be less optimal than bbswitch because it will wait for five seconds before runtime suspending.

Ideally bbswitch gets fixed such that this should not be needed

@abbradar
Copy link
Author

Yeah, the delay is a certain disadvantage, as is need for configuration (or removal of modules). Maybe make this a low-priority (so, opt-in) method?

@Lekensteyn
Copy link
Member

If this is going to be merged, it should not be enabled unless explicitly requested via PMMethod=nouveau (see src/switch/sw_switcheroo.c for an example of such a check). For above reasons and the risk of breakage, it should not be enabled by default.

Also note that you need newer kernels to make nouveau support recent cards. bbswitch does not care about the card and does not have this limitation.

@abbradar
Copy link
Author

Okay, I'll make necessary changes. To clarify, I don't see this as a good solution, it's just that it works for me now while bbswitch is being fixed -- this can be a workaround method for people with similar hardware.

@karolherbst
Copy link

There are several problems:

  1. Nouveau may leave the card in a state where nvidia isn't able to handle the GPU and crashes.
  2. nvidia may leave the card in a state where Nouveau isn't able to handle the GPU and crashes.
  3. X gets very upset if anything happens while loading the card, which means, you need to actually enforce X to not touch the new device at_all (aka enforcing the video-dummy driver on the nvidia GPU)
  4. X loads either the nouveau or the modesetting DDX whenever X starts after bumblebee
  5. applications using lspci can mess up things big times.
  6. systemd is a bitch claiming a ref on nouveau in some situations (backlight? don't know specifics, might be even rootless X stuff, etc...)

short: many things can go south.

You might want to add some flags while loading nouveau: runpm=1, noaccel=1, nofbaccel=1, modeset=2?, config=NvForcePost=1
and maybe even disable all engines, so that the nouveau driver actually doesn't anything besides pm.

references:
https://nouveau.freedesktop.org/wiki/KernelModuleParameters/

@abbradar
Copy link
Author

abbradar commented Nov 21, 2016

Thanks for making this nice list! Let's go over it point by point:

1/2: I can only hope that this would be okay -- we can't guarantee anything. Running more serious tests than glxgears and glxinfo may leave us more confident -- I'll see into it;
3. You mean X is delicate to modules loading/unloading/initialization/etc while it is probing the card and choosing the driver? Good point, I haven't known about that! However, if that's the case then bbswitch et al shouldn't be much better in that regard (de-initialization and powering down of the card can happen when X is loading, which is why we enforce bumblebeed to start before display-manager I think);
4. Was already discussed, but you gave a great idea -- we can force main X server to use dummy driver for the device. This can also be useful for bbswitch and other cases, to lessen possible list of problems;
5. How can they? Nouveau should handle that gracefully, by waking up and powering down the card. NVIDIA shouldn't care -- it just has the card powered up. The only thing I can imagine is that they keep a reference on the card while they enumerate the bus, so the driver can't be switched -- but if it's so, same holds to loading/unloading nvidia in general (as it's done now IIUC);
6. This is another source of problems I haven't thought about -- not sure what to do about this one.

Overall the new source of problems that wasn't there before is something retaining a reference on the card while it's in nouveau mode, so it can't be "switched on". For X.org we can use a dummy driver, not sure what to do about systemd/others.

Your list of flags is a very good help! I'll look through them and add those that seem useful (but I'm unsure how -- install file to /etc/modprobe.d? Add infrastructure for module parameters to bumblebeed?).

@abbradar
Copy link
Author

abbradar commented Nov 21, 2016

Second version of this patch -- I did a trick to force other possible users of nouveau to back out. Namely, i now open /dev/dri/cardN (corresponding to nouveau) and get an exclusive lock -- that way X and others just can't use the device to render (other controls, for example backlight, remain a possible problem -- but we don't have any concrete examples of such behaviour yet).

I also use modeset=2 runpm=1 (this seems sufficient to me, what do you people think?)

@abbradar abbradar force-pushed the develop branch 3 times, most recently from 1791de3 to 81e1200 Compare November 21, 2016 23:39
@abbradar
Copy link
Author

I've tested it on several games (Mount and Blade: Warband, 0 A.D., Quake 3) and it works okay for me. Also (I've forgot to mention this earlier) I've made this strictly opt-in as requested by @Lekensteyn. Also, I have the system running with xf86-video-nouveau installed for some time, with external monitor and X restarts -- no problem so far.

@Lekensteyn
Copy link
Member

I appreciate your effort, but I am still not sure if it should be merged or not due to other magic that is possible with nouveau and the possibly bad interactions with other components.

Some feedback on the patch:

  • Does it still work for you, powering off the GPU? AFAIK opening a fd to the card will take a runtime PM ref which keeps the GPU on which defeats the whole purpose of adding this method.
  • Options like runpm=1 is the effective default for Optimus hardware, it should probably not be added. And when you do, a modprobe.conf file seems more appropriate than hardcoding it into the binary.

And more general:

  • Not sure what @karolherbst means by the lspci thing, if that refers to waking up the GPU on using lspci, then yes, that will always happen for PCI drivers.
  • Another thing not mentioned explicitly so far, when using the nouveau driver, /dev/dri/cardX is also registered. Any application that tries to access it will wake up the GPU. There is not much that can be done against this (removing the nodes could be one way, but it seems like a terrible hack).

@abbradar
Copy link
Author

abbradar commented Nov 22, 2016

I understand your concerns, and if this ends up not merged I can just continue using it by myself.

About your concerns:

  1. Apparently yes. At first I thought that too, but then it occured to me that X.org somehow locks the card without breaking runpm. I checked via lsof and indeed, it holds the card device opened. Currently I have the file locked by bumblebeed and dmesg shows that nouveau powers the card down as expected. I suppose nouveau is smart when detecting card usage;
  2. This is a special case for nouveau module because we also want to have modeset=2 to ensure that it doesn't initialize displays. I don't think that it should be set up for the whole system via modprobe.d -- it adds things that one needs to do to make this work (right now it works out of the box) and requires one to remember to remove it when he tries other driver/switch method;
  3. (general-2) This is the very device that I lock right now, thus preventing any access from other applications, including X.org.

@Lekensteyn
Copy link
Member

Ok, thanks for your replies. Hopefully you do not mind if I leave it open for now, maybe I (or others) can look at it again later.

@abbradar
Copy link
Author

No sweat! Meanwhile I'll take another look at bbswitch when I have more time.

@Lekensteyn
Copy link
Member

So nouveau will no longer be used in Bumblebee (#773) and as #978 looks like a more reliable approach, I'll close this issue. Thanks anyway for your PR!

@Lekensteyn Lekensteyn closed this Aug 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants