[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] PCI passthrough of XHCI on Framework AMD crashes the host
Hi, There is yet another issue affecting Framework AMD... When I start a domU with XHCI controller attached (PCI passthrough), the whole host resets if there was an USB device plugged into it. I don't get any panic message (neither on XHCI console - which is connected to a different XHCI controller, nor on VGA), and the reboot reason register shows 0x08000800 ("an uncorrected error caused a data fabric sync flood event") according to [1]. This is Framework AMD with AMD Ryzen 5 7640U. The crash itself happens quite early on domU startup - specifically when SeaBIOS tries to initialize XHCI. I tracked it down to the second readl() in xhci_controller_setup() [2]. Interestingly, it's specifically the second readl(), regardless of which of those comes first. I tried swapping their order, or even repeating read from the same register - always the second call triggers the crash. The first one succeeds and returns some value (for example 0x1200020 for HCCPARAMS). If I start the domU when no USB devices are connected, it doesn't crash. If I manually unbind the device from the dom0 driver (echo 0000:c3:00.4 > /sys/bus/pci/drivers/xhci_hcd/unbind), it doesn't crash. Note I have seize=1 in domU config, so the `xl pci-assignable-add` calls is implicit. If the system doesn't crash (either by not having any USB devices connected initially, or by the manual unbind), the USB controller in domU works fine. I can later connect devices and they appear inside domU. This system has a couple of XHCI controllers, and the same behavior is observed on at least two of them. The controller works just fine when used in dom0. If I passthrough another PCI device instead (tried wifi card and audio card), it doesn't crash. The value read from from HCCPARAMS (BAR + 0x10) differs between good and bad case: - 0x01200020 when it crashes - 0x0110ffc5 when it works It's weird to have this much differences here, given most bits in this register is about device capabilities[3], not its runtime state... In this system my main debugging tool is the XHCI console. But I tried also without enabling XHCI console, and it still crashes, so it looks like it isn't caused by the XHCI console. I tried also disabling XHCI initialization in SeaBIOS, and then it proceeds to booting domU's kernel. But as soon as Linux gets into initializing that USB controller, it crashes the same way. So, it isn't just SeaBIOS doing something weird (or at least not just that). With PVH dom0, the behavior is a bit different: 1. Initially, the controller works fine in dom0. 2. When starting domU, instead of clean unbind this happens: [ 11.248760] xhci_hcd 0000:c3:00.4: Controller not ready at resume -19 [ 11.248765] xhci_hcd 0000:c3:00.4: PCI post-resume error -19! [ 11.248767] xhci_hcd 0000:c3:00.4: HC died; cleaning up [ 11.249010] xhci_hcd 0000:c3:00.4: remove, state 4 [ 11.249013] usb usb8: USB disconnect, device number 1 [ 11.249437] xhci_hcd 0000:c3:00.4: USB bus 8 deregistered [ 11.249832] xhci_hcd 0000:c3:00.4: remove, state 4 [ 11.249835] usb usb7: USB disconnect, device number 1 [ 11.250074] xhci_hcd 0000:c3:00.4: Host halt failed, -19 [ 11.250076] xhci_hcd 0000:c3:00.4: Host not accessible, reset failed. [ 11.250389] xhci_hcd 0000:c3:00.4: USB bus 7 deregistered [ 11.251011] pciback 0000:c3:00.4: xen_pciback: seizing device [ 11.335120] pciback 0000:c3:00.4: xen_pciback: vpci: assign to virtual slot 0 [ 11.335544] pciback 0000:c3:00.4: registering for 1 3. Reading from BAR in domU (in SeaBIOS, and later Linux) returns 0xffffffff. 4. Does not crash the host. Any ideas? I don't have any other system with Zen4 to try on. The hw11 gitlab runner is Ryzen 7 7735HS, and it doesn't have this issue. It's also possible this is something related to Framework's firmware, but give all the observations above, I find it less likely. [1] https://docs.kernel.org/arch/x86/amd-debugging.html#random-reboot-issues [2] https://github.com/coreboot/seabios/blob/master/src/hw/usb-xhci.c#L553 [3] https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/extensible-host-controler-interface-usb-xhci.pdf (page 385) -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab Attachment:
signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |