Frogging-Family / linux-tkg

linux-tkg custom kernels

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Fedora 35] No longer able to boot Fedora's stock kernel

RefinedAnarchy opened this issue · comments

Before I installed linux-tkg, I was able to boot the stock kernel just fine. But since then, every time I boot up the stock kernel, the splash screen spits out something like this:

[FAILED] Failed to start D-Bus System Message Bus. //repeated for like 6 more times
[FAILED] Failed to start Bluetooth service.
//complains about alsactl
systemd-shutdown

Despite all that, linux-tkg still boots up just fine but I'd like to be able to use stock kernel just in case.
Perhaps the fact that it replaces the stock kernel-headers and kernel-devel is related to this?

Kernel-devel and kernel-headers shouldn't intervene in the runtime of anything 🤔

Have you tried reinstalling the stock kernel ?

Kernel-devel and kernel-headers shouldn't intervene in the runtime of anything thinking

Have you tried reinstalling the stock kernel ?

Yes I have (dnf reinstall kernel kernel-core). I also tried completely removing and reinstalling too. And I just noticed everything that fails are systemd services

Can you give the output of sudo systemctl status xxx where xxx is the service that fails ? Maybe we can get useful information on why services fail.

My first guess is that it has something to do with ZSTD compression of modules again.

Can you give the output of sudo systemctl status xxx where xxx is the service that fails ? Maybe we can get useful information on why services fail.

I can't get the status of those services from the stock kernel because it fails to boot. I can get the tkg one but they runs just fine.

My first guess is that it has something to do with ZSTD compression of modules again.

That's what I think might've happened too, how do I change the tkg one to zstd instead of lz4?

I can't get the status of those services from the stock kernel because it fails to boot. I can get the tkg one but they runs just fine.

Can you get a journalctl of it ? For example you boot in the kernel that fails, then boot back in the one that works, then do sudo journalctl -b -1 (get the log of the before last boot), you check if it's indeed the boot that fails, then you can attach it here.

That's what I think might've happened too, how do I change the tkg one to zstd instead of lz4?

Arch uses ZSTD by default, and we made other distros' follow suite, Ubuntu got an issue with it so we disabled it. But I don't understand how Fedora has issues with as it boots fine with the tkg kernel but not the stock kernel anymore... That's weird, you can try to build another tkg kernel without zstd and see if it boots fine: you have to use make xconfig (you are prompted for it at the end of the configure step, right before the compilation starts) and look for MODULE_COMPRESS_ZSTD and disable that.

Can you get a journalctl of it ? For example you boot in the kernel that fails, then boot back in the one that works, then do sudo journalctl -b -1 (get the log of the before last boot), you check if it's indeed the boot that fails, then you can attach it here.

I tried sudo journalctl --list-boot and checked the previous and entries but none of the one booted using the stock kernel are logged.

Arch uses ZSTD by default, and we made other distros' follow suite, Ubuntu got an issue with it so we disabled it. But I don't understand how Fedora has issues with as it boots fine with the tkg kernel but not the stock kernel anymore... That's weird, you can try to build another tkg kernel without zstd and see if it boots fine: you have to use make xconfig (you are prompted for it at the end of the configure step, right before the compilation starts) and look for MODULE_COMPRESS_ZSTD and disable that.

I disabled compression on the tkg kernel and it boots just fine. However, I tried to copy the config of the stock kernel and build the tkg kernel based on it. Guess what? It failed to boot

However, I tried to copy the config of the stock kernel and build the tkg kernel based on it. Guess what? It failed to boot

That's an interesting information, can you attach both here ? Let's try to diff that. But it still puzzles me that your system somehow adapted to the tkg kernels and stopped working with what it was meant to work with. Did you do a filesystem change for example ?

That's an interesting information, can you attach both here

Sure config-5.15.6-200.fc35.x86_64 and config-5.15.7_tkg_bmq

Did you do a filesystem change for example ?

No, I still use the stock fedora btrfs config

The difference between the two files is too big to find anything interesting unfortunately. But after having a look at what Gentoo recommends to have in the kernel .config for systemd, they redirect to this up to date file. You can have a look to see if every option is respected in Fedora's config file: have you tried an older or newer Fedora kernel ?

I also tried kernel-5.14.10-300.fc35, which is the version came with the livecd and it doesn't boot either. I'll try to install linux-tkg on a fresh installation to see if anything happen

I tried a fresh install and tkg + stock worked just fine. I think it might have something to do with grub because I remember doing 'grub2-mkconfig -o /boot/efi/EFI/fedora/grub.conf'. Apparently, that is not recommended after F33. I had tried ways to reset it without success. But the fact that it broke the stock kernel is super weird

Apparently, that is not recommended after F33

What do they recommend ??

Okay some weird edge case bug then

@AdelKS commented on 14 déc. 2021, 21:43 UTC+1:

Apparently, that is not recommended after F33

What do they recommend ??

Okay some weird edge case bug then

https://ask.fedoraproject.org/t/documentation-errors-on-grub-cfg-location-for-fedora-34-and-up/17921

https://docs.fedoraproject.org/en-US/fedora/f35/system-administrators-guide/kernel-module-driver-configuration/Working_with_the_GRUB_2_Boot_Loader/

So, /etc/grub2-efi.cfg&  /etc/grub2.cfg respectfully. Also, a Fedora user suffering from this issue but i expected it.

I was unfortunate and experienced this same problem. None of the stock kernels I had would boot. Only the two linux-tkg-pds kernels I have would work.

I disabled the "rhgb quiet" kernel boot parameters so I could see what the errors were and one of the first errors was:
Failed to mount Huge Pages File System

After searching online, I stumbled upon this page and there I saw a solution to disable selinux by either editing the file /etc/selinux/config and change SELINUX=enforcing to SELINUX=disabled

or

enter a command from the terminal like this: sudo grubby --update-kernel=ALL --args "selinux=0"

Now I'm able to boot normally with the stock kernel. I have no idea, what selinux is (other than it is Suse Enterprise Linux) or what kind of security issues I might have introduced by disabling it.

If anyone has any information how to restore selinux and continue to use stock kernels and linux-tkg-kernels together, please reply.

@wallcarpet40 commented on Jan 13, 2022, 4:37 PM UTC:

I was unfortunate and experienced this same problem. None of the stock kernels I had would boot. Only the two linux-tkg-pds kernels I have would work.

I disabled the "rhgb quiet" kernel boot parameters so I could see what the errors were and one of the first errors was:
Failed to mount Huge Pages File System

After searching online, I stumbled upon this page and there I saw a solution to disable selinux by either editing the file /etc/selinux/config and change SELINUX=enforcing to SELINUX=disabled

or

enter a command from the terminal like this: sudo grubby --update-kernel=ALL --args "selinux=0"

Now I'm able to boot normally with the stock kernel. I have no idea, what selinux is (other than it is Suse Enterprise Linux) or what kind of security issues I might have introduced by disabling it.

If anyone has any information how to restore selinux and continue to use stock kernels and linux-tkg-kernels together, please reply.

Note that Fedora actively discourage doing so.

@PoorPocketsMcNewHold Thank you for the link. I changed the /etc/selinux/config file with SELINUX=permissive, then rebooted, which triggered Fedora to relabel everything. After that, I changed it back to SELINUX=enforcing and now all the kernels work again.

@PoorPocketsMcNewHold Thank you for the link. I changed the /etc/selinux/config file with SELINUX=permissive, then rebooted, which triggered Fedora to relabel everything. After that, I changed it back to SELINUX=enforcing and now all the kernels work again.

Same for me, outside of some audio and network issues, but this is probably unrelated.

Proposed patch for this issue:
383-fedora.txt

As people here suggested a link with SELinux, I investigated. Actually, the TKG kernel does not support SELinux at runtime as expected in Fedora.

$ uname -a
Linux fedora 6.5.3_tkg_bore_eevdf #1 SMP PREEMPT_DYNAMIC TKG Sat Sep 16 15:57:46 CEST 2023 x86_64 GNU/Linux
$ sestatus
SELinux status:                 disabled

SELinux expect the files to have some flag associated, as a consequence "to prevent incorrectly labeled and unlabeled files from causing problems, SELinux automatically relabels file systems when changing from the disabled state to permissive or enforcing mode."
Changing SELinux states and modes
Here are the files SELinux label on my system before using TKG, while using TKG, after using back stock kernel :
selinux-files.zip
A lot of files are loosing flag during the use of TKG. When going back to the stock kernel, a systemd service is in charge of relabeling the whole filesystem : selinux-autorelabel-mark.service.
selinux-autorelabel-mark.service source
It seems this service needs to reboot the system, which seems logical. Then, after the reboot, the filesystem is relabeled.
Depending on your hardware, this relabeling process can be quite long. I suspect, this is where the issue is here : the system reboot, the system seems to hang. Maybe people hard reboot...

Thus my proposal is the setup the kernel configuration like fedora for the SELinux part, when using Fedora.
Fedora kernel configuration
SELinux in the Arch wiki

I tested almost all TKG kernels with my patch (except 6.6.0 as it is broken currently), they all compile and run. Boot time is normal, logs are what is expected (except for zram/zswap for older kernel wich fails, but is is another story, and sometimes freeze using 6.5 but might be related to another issue). Switching from TKG to stock is flawless.

Regards & bisous les grenouilles.

Proposed patch for this issue: 383-fedora.txt

As people here suggested a link with SELinux, I investigated. Actually, the TKG kernel does not support SELinux at runtime as expected in Fedora.

$ uname -a
Linux fedora 6.5.3_tkg_bore_eevdf #1 SMP PREEMPT_DYNAMIC TKG Sat Sep 16 15:57:46 CEST 2023 x86_64 GNU/Linux
$ sestatus
SELinux status:                 disabled

SELinux expect the files to have some flag associated, as a consequence "to prevent incorrectly labeled and unlabeled files from causing problems, SELinux automatically relabels file systems when changing from the disabled state to permissive or enforcing mode." Changing SELinux states and modes Here are the files SELinux label on my system before using TKG, while using TKG, after using back stock kernel : selinux-files.zip A lot of files are loosing flag during the use of TKG. When going back to the stock kernel, a systemd service is in charge of relabeling the whole filesystem : selinux-autorelabel-mark.service. selinux-autorelabel-mark.service source It seems this service needs to reboot the system, which seems logical. Then, after the reboot, the filesystem is relabeled. Depending on your hardware, this relabeling process can be quite long. I suspect, this is where the issue is here : the system reboot, the system seems to hang. Maybe people hard reboot...

Thus my proposal is the setup the kernel configuration like fedora for the SELinux part, when using Fedora. Fedora kernel configuration SELinux in the Arch wiki

I tested almost all TKG kernels with my patch (except 6.6.0 as it is broken currently), they all compile and run. Boot time is normal, logs are what is expected (except for zram/zswap for older kernel wich fails, but is is another story, and sometimes freeze using 6.5 but might be related to another issue). Switching from TKG to stock is flawless.

Regards & bisous les grenouilles.

I tried the Chevek patch on Fedora 38
It works as expected

SELinux is enabled :

$ uname -a
Linux fedora 6.5.3_tkg_eevdf #1 SMP PREEMPT_DYNAMIC TKG Sun Sep 17 12:59:27 CEST 2023 x86_64 GNU/Linux
$ sestatus
SELinux status:                 enabled

And I can reboot to the default fedora kernel without any problem.

I used 3 patches to help with issue #29568 ,
https://lore.kernel.org/linux-kbuild/20231103-rpmpost-v1-1-9c18afab47f4@meta.com/
From this PR #840
9cb4110
Nanotwerp@c044707

AFAICT those only apply to kernel 6.6. Thus I put all this in there and I was able to have a fonctionnal kernel 6.6.1 TKG kernel on Fedora 39 installing and running flawlessly with SElinux and SWAPonZRAM support :
https://github.com/Chevek/linux-tkg/tree/F39

I improved how I change the kernel configuration using the proper tools and added SWAPonZRAM configuration, which is a Fedora feature https://fedoraproject.org/wiki/Changes/SwapOnZRAM

There is still a minor issue with SWAPonZRAM configuration as the script ask to choose ZSWAP_COMPRESSOR_DEFAULT, even if I explicitly configured for ZSWAP_COMPRESSOR_DEFAULT_LZO. This is a matter I need to investigate further.

I've tried to get ride of this configuration message (the choice should be "ZSWAP_COMPRESSOR_DEFAULT_LZO"):

-> Building kernel RPM packages
  SYNC    include/config/auto.conf
*
* Restart config...
*
*
* Support for paging of anonymous memory (swap)
*
Support for paging of anonymous memory (swap) (SWAP) [Y/n/?] y
  Compressed cache for swap pages (ZSWAP) [Y/n/?] y
    Enable the compressed cache for swap pages by default (ZSWAP_DEFAULT_ON) [N/y/?] n
    Invalidate zswap entries when pages are loaded (ZSWAP_EXCLUSIVE_LOADS_DEFAULT_ON) [Y/n/?] y
    Default compressor
      1. Deflate (ZSWAP_COMPRESSOR_DEFAULT_DEFLATE)
    > 2. LZO (ZSWAP_COMPRESSOR_DEFAULT_LZO)
      3. 842 (ZSWAP_COMPRESSOR_DEFAULT_842)
      4. LZ4 (ZSWAP_COMPRESSOR_DEFAULT_LZ4) (NEW)
      5. LZ4HC (ZSWAP_COMPRESSOR_DEFAULT_LZ4HC)
      6. zstd (ZSWAP_COMPRESSOR_DEFAULT_ZSTD) (NEW)
    choice[1-6?]: 

But I'm stuck. Please help me on this one.

To try:

git clone https://github.com/Chevek/linux-tkg.git
cd linux-tkg/
git checkout F39
./install.sh install

I tested PR Chevek@5e8105d against 6.6.1 and 6.7-RC. In both cases the boot sequence was normal on Fedora 39.

SELinux is enforcing and it seems even SwapOnZRAM works fine.

$ uname -a
Linux fedora 6.6.1_tkg_eevdf #1 SMP PREEMPT_DYNAMIC TKG Wed Nov 15 00:17:53 CET 2023 x86_64 GNU/Linux

$ getenforce
Enforcing
$ sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33
$ swapon --show
NAME            TYPE       SIZE USED PRIO
/dev/nvme2n1p16 partition 33,2G   0B   -2
/dev/zram0      partition    8G   0B  100
$ systemctl | grep zram
  sys-devices-virtual-block-zram0.device                                                                                                                   loaded active plugged   /sys/devices/virtual/block/zram0
  systemd-zram-setup@zram0.service                                                                                                                         loaded active exited    Create swap on /dev/zram0
  system-systemd\x2dzram\x2dsetup.slice                                                                                                                    loaded active active    Slice /system/systemd-zram-setup
  dev-zram0.swap                                                                                                                                           loaded active active    Compressed Swap on /dev/zram0
$ zramctl
NAME       ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd            8G   4K   64B   20K      16 [SWAP]


$ uname -a
Linux fedora 6.7.0_rc1_tkg_eevdf #1 SMP PREEMPT_DYNAMIC TKG Wed Nov 15 00:57:23 CET 2023 x86_64 GNU/Linux

$ getenforce
Enforcing
$ sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33
$ swapon --show
NAME            TYPE       SIZE USED PRIO
/dev/nvme2n1p16 partition 33,2G   0B   -2
/dev/zram0      partition    8G   0B  100
$ systemctl | grep zram
  sys-devices-virtual-block-zram0.device                                                                                                                   loaded active plugged   /sys/devices/virtual/block/zram0
  systemd-zram-setup@zram0.service                                                                                                                         loaded active exited    Create swap on /dev/zram0
  system-systemd\x2dzram\x2dsetup.slice                                                                                                                    loaded active active    Slice /system/systemd-zram-setup
  dev-zram0.swap                                                                                                                                           loaded active active    Compressed Swap on /dev/zram0
$ zramctl
NAME       ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd            8G   4K   64B   20K      16 [SWAP]

Hello, please try #848 and see if it fixes this !