SOLUTION BELOW

The actual bug


I have never been in a more confusing situation regarding Linux.

I have a Dell XPS 15 9560, which had a dual boot Windows 10 / EndeavourOS setup. It was running fine for months. 10 days ago I updated Linux and after restart it couldn’t boot anymore. It got stuck at “A start job is running for /dev/disk/by-uuid/…” (which is the root partition).

First, with the help of a friend of mine who is quite knowledgeable about Linux (he runs vanilla Arch, etc), we spent 5 hours trying to fix it but had no luck.

Then I decided to back up everything and do a fresh install. Aaaand the same error happened again on the first boot. Then I though “ok, probably some problem with Arch, lets try Fedora”. Nope. Some similar error about not finding the root partition. (Here I must say that the kernel which was shipped with the ISO was working fine, but after updating to the latest one, it failed.) Here I thought “ok, then it might be a problem with the latest kernel, let’s install EndeavourOS with the LTS kernel.” Nope, LTS kernel also didn’t boot. Then I tried Ubuntu and it worked, but that’s not solving the problem. Then I decided to put another nvme drive in the laptop and try there. The same error again.

Now the greatest part: If I put the nvme drive into an external usb case, EndeavourOS installs, updates, boots without any problem, no sign of the error.

So now I don’t know how to proceed… Maybe there is something wrong with the pcie port in my laptop, but except for the booting problem, windows is working, I can also mount and access every partition in the ssd through a live usb. So no other signs of problem with the port whatsoever.

I would be grateful for any advice as I’ve lost several days trying to solve this and I am out of ideas…


Solution: The last working kernels are from 11. August 2023 (both linux and linux-lts) linux-6.4.10.arch1-1 and linux-lts-6.1.45-1. You can download them from here: linux / linux-lts and install them with

sudo pacman -U the_path_to_the_package

Thank you all for the help!

  • @oiram15OP
    link
    010 months ago

    I’m on systemd-boot. Where isn’t a directory loaders under loader, but I found the parameters under /etc/kernel/cmdline:

    nvme_load=YES nowatchdog rw root=UUID=9ae3c50f-be08-4594-ac30-2d094375868d

    • Illecors
      link
      fedilink
      English
      010 months ago

      My bad, I think in your case it’s in /efi/loader/entries/something.conf

      Since / is not mounted, yet, bootloader will not be able to read anything under /etc/. Unless it’s used to automatically populate the loader.conf.

      Also check /efi/loader/loader.conf.

      • @oiram15OP
        link
        110 months ago

        I found it!

        [liveuser@eos-2023.08.05 ~]$ cat /mnt/efi/loader/entries/02ef85f9edc146d598502c1b296ff64a-6.4.12-arch1-1.conf 
        # Boot Loader Specification type#1 entry
        # File created by /etc/kernel/install.d/90-loaderentry.install (systemd 254.1-1-arch)
        title      EndeavourOS
        version    6.4.12-arch1-1
        machine-id 02ef85f9edc146d598502c1b296ff64a
        sort-key   endeavouros-6.4.12-arch1-1
        options    nvme_load=YES nowatchdog rw root=UUID=9ae3c50f-be08-4594-ac30-2d094375868d systemd.machine_id=02ef85f9edc146d598502c1b296ff64a
        linux      /02ef85f9edc146d598502c1b296ff64a/6.4.12-arch1-1/linux
        initrd     /02ef85f9edc146d598502c1b296ff64a/6.4.12-arch1-1/initrd
        
        
        • Illecors
          link
          fedilink
          English
          010 months ago

          I’ve never used machine-id with systemd-boot, but everything appears to be corrent. Presumably, /boot contains a directory named 6.4.12-arch1-1, which contains files linux and initrd, correct?

          You could try rebuilding the initramfs with mkinitcpio --allpresets while chrooted.

          • @oiram15OP
            link
            0
            edit-2
            10 months ago

            they are under /02ef85f9edc146d598502c1b296ff64a/6.4.12-arch1-1/, but yes.

            EndeavourOS is using dracut by default.

            Edit: we tried rebuilding initramfs before, but it didn’t help

            • Illecors
              link
              fedilink
              English
              010 months ago

              OK, I see nothing wrong. Let’s try building a new config that’s as minimal as possible. Copy linux and initrd files to /boot/.

              /efi/loader/entries/test.conf

              title      Test
              options    root=/dev/nvme0n1p2
              linux      /linux
              initrd     /initrd
              
                • Illecors
                  link
                  fedilink
                  English
                  010 months ago

                  I think failure to change power states is a big issue, but this is out of my depth now. Sorry :(

                  • @Hupf@feddit.de
                    link
                    fedilink
                    110 months ago

                    It matches the observation with the external USB enclosure though. I think the ASPM / ACPI path would be the most promising.

                    If you know a last working and a first broken kernel version, maybe do a bisection.

                  • @oiram15OP
                    link
                    010 months ago

                    Thank you for your time and patience!