SOLUTION BELOW

The actual bug


I have never been in a more confusing situation regarding Linux.

I have a Dell XPS 15 9560, which had a dual boot Windows 10 / EndeavourOS setup. It was running fine for months. 10 days ago I updated Linux and after restart it couldn’t boot anymore. It got stuck at “A start job is running for /dev/disk/by-uuid/…” (which is the root partition).

First, with the help of a friend of mine who is quite knowledgeable about Linux (he runs vanilla Arch, etc), we spent 5 hours trying to fix it but had no luck.

Then I decided to back up everything and do a fresh install. Aaaand the same error happened again on the first boot. Then I though “ok, probably some problem with Arch, lets try Fedora”. Nope. Some similar error about not finding the root partition. (Here I must say that the kernel which was shipped with the ISO was working fine, but after updating to the latest one, it failed.) Here I thought “ok, then it might be a problem with the latest kernel, let’s install EndeavourOS with the LTS kernel.” Nope, LTS kernel also didn’t boot. Then I tried Ubuntu and it worked, but that’s not solving the problem. Then I decided to put another nvme drive in the laptop and try there. The same error again.

Now the greatest part: If I put the nvme drive into an external usb case, EndeavourOS installs, updates, boots without any problem, no sign of the error.

So now I don’t know how to proceed… Maybe there is something wrong with the pcie port in my laptop, but except for the booting problem, windows is working, I can also mount and access every partition in the ssd through a live usb. So no other signs of problem with the port whatsoever.

I would be grateful for any advice as I’ve lost several days trying to solve this and I am out of ideas…


Solution: The last working kernels are from 11. August 2023 (both linux and linux-lts) linux-6.4.10.arch1-1 and linux-lts-6.1.45-1. You can download them from here: linux / linux-lts and install them with

sudo pacman -U the_path_to_the_package

Thank you all for the help!

  • @oiram15OP
    link
    1
    edit-2
    10 months ago

    This is the log https://ufile.io/p5wj1hu0

    I’m not sure if this is relevant, but I added pci=nommconf to the kernel parameters as I had errors similar to the one in this topic, otherwise the log was over 40 000 lines long.

    Edit: I’m not sure if this is exactly a kernel problem, as I get the same error when booting with the LTS kernel, which is older than the one installed with the offline installer.

    • abrer
      link
      fedilink
      0
      edit-2
      10 months ago

      No worries. When checking that output, it is for the working 6.4.8-arch1-1 kernel. The broken kernel boot attempt would be most useful, but I don’t want to make you suffer to get it, if you are back to a working system. I think at this point it is safe to say your laptop isn’t a fan of the newer kernels.

      I would :

      1. (fresh install/andor working machine) update your /etc/pacman.conf to ignore updates to packages linux and linux-lts
      2. Devise a way to add multiple systemd-boot boot entries. I was working on this just a bit ago but I don’t have it fool proof and it drops you to an emergency shell. So I am hesitant to share this at the moment.

      Ideally: You could (from a working system) install a known working LTS image (pkg linux-lts), and exclude that from updates until you land on a working kernel release (keep an eye on testing and core repos once a week or so). in this way, you’ll have a working LTS, and can upgrade/downgrade mainline kernels as you please, booting back into LTS to correct issues should they arise.

      edit: minor

      • @oiram15OP
        link
        2
        edit-2
        10 months ago

        I found the problem! It was actually the kernel, just both the latest and the lts versions released after 11th of August don’t work. So for now I need to stick to the versions from 11th. Thank you very much for the help!

        Edit: this is the bug