This is very strange. The nvidia module should not be needed if intel is in use and it makes no sense to freeze KDE if unused…
Update: I have in the meantime talked about this the developers in #bumblebee on IRC, Lekensteyn suggested building bumblebee+bbswitch from the develpment branch as it includes some related fixes. I hope we can provide that asap.
Here’s the full discussion:
[00:31] <tetris4> Hi everyone. We are preparing a kernel/xorg/graphics drivers update over at Chakra and this issue came up with bumblebee not being able to properly blacklist the nvidia module
[00:31] <tetris4> Here is the relevant report: https://bugtracker.chakralinux.org/index.php?do=details&task_id=1988
[00:31] <tetris4> Any insight on how to fix or workaround this for our users would be greatly appreciated!
[00:32] <tetris4> here is our package details, mostly imported from ArchLinux: https://chakralinux.org/code/core.git/tree/bumblebee
[00:36] [Notice] -ChanServ- [#chakra] Welcome to the Chakra discussion and support channel!
[00:53] <Lekensteyn> hi tetris4
[00:56] <Lekensteyn> tetris4: could you give the develop branch a go? that branch has some fixes related to modules
[00:56] <Lekensteyn> the Debian folks helped a lot there
[00:57] <tetris4> hi Lekensteyn, long time no speak, I hope you are doing well. =)
[00:57] <tetris4> glad to see the project is still active. So just pull from the latest git commit?
[00:58] <Lekensteyn> develop branch (master tracks the latest release)
[00:58] <Lekensteyn> project is active by virtue of gsgatlin and bluca
[00:59] <tetris4> ok, should I got for the develop branch of bumblebee, bbswitch or both?
[00:59] <tetris4> kudos to them then!
[00:59] <tetris4> *got=go
[01:00] <Lekensteyn> just develop of bumblebee
[01:00] <Lekensteyn> you can also pull develop for bbswitch to fix building against 4.12 (really a small patch)
[01:01] <Lekensteyn> and if people are running into issues related to a hanging machine, see https://github.com/Bumblebee-Project/Bumblebee/issues/764
[01:01] <tetris4> we do use kernel 4.12.4 so that makes sense
[01:01] <Lekensteyn> I tried to debug that issue, but no luck :( sticking with a workaround for now
[01:02] <Lekensteyn> if possible, I would personally recommend people to try nouveau if they just have the need for external monitor and power saving in othercases
[01:02] <Lekensteyn> have not tried the blob for a long time, so am not fully up to date with its issues
[01:03] <Lekensteyn> (and features of course ;))
[01:03] <tetris4> well personally I used this as an opportunity to switch to PRIME+nouveau, seems to be fine for now, don't really use the nvidia card nowadays
[01:04] <tetris4> I understand people want to use nvidia for gaming mostly
[01:04] <Lekensteyn> I used bbswitch with my old laptop (since it had its outputs wired to intel)
[01:05] <Lekensteyn> my new laptop resulted in moving to nouveau instead due to the external monitor
[01:05] <Lekensteyn> so that's also one of the reasons why I am less active in bb nowadays
[01:06] <Lekensteyn> a pity for the users, I try to reply when I have the time, but it is not a sustaining approach
[01:06] <tetris4> true...
[01:06] <Lekensteyn> thanks for maintaining chakra packages btw :)
[01:07] <tetris4> am still on the Dell XPS L502X, which was doing great with bb until this issue came up
[01:08] <tetris4> nah, thanks for providing us with this project, don't know what I would have done over the years without it.
[01:09] <Lekensteyn> as for debugging this, any hints in journal or /var/log/Xorg.*.log? anything in release notes of nvidia?
[01:10] <tetris4> unfortunately I can't replicate the issue on my 2nd installation on the same system, which is weird
[01:10] <tetris4> I will ask the user, if they can't find anything I might try to revert this installation back to bb and check
[01:11] <tetris4> hmm, release notes of nvidia? what should I look for there?
[01:11] <Lekensteyn> perhaps they have changed the way modules get loaded
[01:11] <Lekensteyn> or introduced new modules
[01:12] <tetris4> what's strange is that I would expect to find some reports on Arch about this, but didn't see anything
[01:13] <tetris4> so I wonder how it can be Chakra specific
[01:13] <tetris4> also any idea why that user could not force the unloading of the module on his system, whereas I could?
[01:15] <Lekensteyn> is it possible that the user's primary X server suddenly took ownership of the device via hotplugging?
[01:15] <Lekensteyn> if something like that happened, it should be visible in the xorg log
[01:16] <Lekensteyn> another possibility is that nvidia was already loaded (at least once) during startup of the primary X server
[01:17] <Lekensteyn> then since the primary X server owns this device (as exposed by the nvidia driver), it will not be possible to unload nvidia
[01:17] <Lekensteyn> that's often the result. Now what causes this, I would probably be looking into the modprobe blacklist, whether they function or not
[01:18] <tetris4> dunno if you read it already, but here is his initial report: https://community.chakralinux.org/t/big-upgrade-of-xorg-kernel-and-graphics-drivers-that-include-important-changes-is-now-available-in-testing/6475/22?u=tetris4
[01:18] <Lekensteyn> is the nvidia module in /etc/modules-load.d/? (should not)
[01:19] <tetris4> oki, I'll ask that also in my response
[01:20] <Lekensteyn> `modprobe -f nvidia` is also not recommended. If it is in use, it is important to figure out what exactly uses it (primary X server, Bumblebee's secondary X server, maybe some CUDA application? etc)
[01:20] <tetris4> how does one check for that though?
[01:21] <Lekensteyn> it can help to check the process list (ps uww -C X or ps uww -C Xorg)
[01:21] <tetris4> ehehe, one more thing to ask him =P
[01:21] <Lekensteyn> and assuming that /dev/dri/card1 is nvidia, you can also try sudo lsof -n /dev/dri/card1
[01:22] <Lekensteyn> to discover whether it is actually nvidia, have a look in the Xorg log
[01:23] <tetris4> I'll just copy paste what you are writing here, hopefully the user can follow.
[01:24] <Lekensteyn> actually, skip that xorg step. To see whether card1 corresponds to nvidia, check sysfs with: ls -l /sys/class/drm/card1
[01:24] <Lekensteyn> for me it gives: lrwxrwxrwx 1 root root 0 Aug 26 14:04 /sys/class/drm/card1 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1
[01:25] <Lekensteyn> intel (card0) would give: ../../devices/pci0000:00/0000:00:02.0/drm/card0
[01:26] <tetris4> hmm, here on prime+nouveau, running that command gives card1, but the card seems to be off
[01:26] <Lekensteyn> that's correct
[01:26] <Lekensteyn> card1 is only created when a DRM driver (like nouveau, and presumably nvidia-drm) is loaded
[01:27] <Lekensteyn> in the case of nouveau, runtime PM ensures that the device suspends when no user is accessing it (e.g. Xorg/DRI_PRIME applications via /dev/dri/cardX, lspci accessing config space, etc.)
[01:28] <Lekensteyn> and even if Xorg has claimed it, if no outputs are active, it will also suspend
[01:30] <tetris4> how can one check if card1 is nvidia and 0 is intel?
[01:30] <Lekensteyn> examine the syfs symlinks, ls -l /sys/class/drm/card*
[01:30] <Lekensteyn> 00:02.0 is the intel one, 01:00.0 (or 02:00.0, etc.) is the nvidia one
[01:32] <tetris4> that's weird, does this make sense then? https://dpaste.de/ALxB
[01:32] <tetris4> ah, sorry, its correct, 02 is intel
[01:33] <tetris4> I would expect intel to be 01, and nvidia 02
[01:33] <Lekensteyn> 00:01.0 is the PCI root port to which the nvidia PCI device is attached 01:00.0
[01:33] <Lekensteyn> 00:02.0 is the IGD (Integrated Graphics Device)
[01:34] <Lekensteyn> those 00:0x.0 addresses seem pretty standard for a given Intel CPU family
[01:34] <tetris4> so card0 should be IGD?
[01:34] <Lekensteyn> yes
[01:34] <Lekensteyn> normally that is the case
[01:34] <tetris4> ok I think I got it now
[01:34] <Lekensteyn> I haven't seen otherwise
[01:35] <tetris4> Lekensteyn: ok to share this conversation as is on our forum?
[01:35] <Lekensteyn> sure
[01:35] <tetris4> thanks
[01:35] <Lekensteyn> please spread the knowledge :)
[01:38] <tetris4> =D
So some questions that come up:
- Does running an application with optirun work for you?
- Did you check the output of
dmesg | tail -1 after running the tee command to turn the card off?
- Any relevant output in
- Is there any nvidia module in /etc/modules-load.d/ (it shouldn’t)?