[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

PCI passthrough of XHCI on Framework AMD crashes the host



Hi,

There is yet another issue affecting Framework AMD... When I start a
domU with XHCI controller attached (PCI passthrough), the whole host
resets if there was an USB device plugged into it. I don't get any panic
message (neither on XHCI console - which is connected to a different
XHCI controller, nor on VGA), and the reboot reason register shows
0x08000800 ("an uncorrected error caused a data fabric sync flood
event") according to [1].

This is Framework AMD with AMD Ryzen 5 7640U.

The crash itself happens quite early on domU startup - specifically when
SeaBIOS tries to initialize XHCI. I tracked it down to the second
readl() in xhci_controller_setup() [2]. Interestingly, it's specifically
the second readl(), regardless of which of those comes first. I tried
swapping their order, or even repeating read from the same register -
always the second call triggers the crash. The first one succeeds and
returns some value (for example 0x1200020 for HCCPARAMS).

If I start the domU when no USB devices are connected, it doesn't crash.

If I manually unbind the device from the dom0 driver (echo 0000:c3:00.4 >
/sys/bus/pci/drivers/xhci_hcd/unbind), it doesn't crash. Note I have
seize=1 in domU config, so the `xl pci-assignable-add` calls is implicit.

If the system doesn't crash (either by not having any USB devices
connected initially, or by the manual unbind), the USB controller in
domU works fine. I can later connect devices and they appear inside
domU.

This system has a couple of XHCI controllers, and the same behavior is
observed on at least two of them.

The controller works just fine when used in dom0.

If I passthrough another PCI device instead (tried wifi card and audio
card), it doesn't crash.

The value read from from HCCPARAMS (BAR + 0x10) differs between good and bad 
case:
- 0x01200020 when it crashes
- 0x0110ffc5 when it works

It's weird to have this much differences here, given most bits in this
register is about device capabilities[3], not its runtime state...

In this system my main debugging tool is the XHCI console. But I tried
also without enabling XHCI console, and it still crashes, so it looks
like it isn't caused by the XHCI console.

I tried also disabling XHCI initialization in SeaBIOS, and then it
proceeds to booting domU's kernel. But as soon as Linux gets into
initializing that USB controller, it crashes the same way. So, it isn't
just SeaBIOS doing something weird (or at least not just that).

With PVH dom0, the behavior is a bit different:
1. Initially, the controller works fine in dom0.
2. When starting domU, instead of clean unbind this happens:

    [   11.248760] xhci_hcd 0000:c3:00.4: Controller not ready at resume -19
    [   11.248765] xhci_hcd 0000:c3:00.4: PCI post-resume error -19!
    [   11.248767] xhci_hcd 0000:c3:00.4: HC died; cleaning up
    [   11.249010] xhci_hcd 0000:c3:00.4: remove, state 4
    [   11.249013] usb usb8: USB disconnect, device number 1
    [   11.249437] xhci_hcd 0000:c3:00.4: USB bus 8 deregistered
    [   11.249832] xhci_hcd 0000:c3:00.4: remove, state 4
    [   11.249835] usb usb7: USB disconnect, device number 1
    [   11.250074] xhci_hcd 0000:c3:00.4: Host halt failed, -19
    [   11.250076] xhci_hcd 0000:c3:00.4: Host not accessible, reset failed.
    [   11.250389] xhci_hcd 0000:c3:00.4: USB bus 7 deregistered
    [   11.251011] pciback 0000:c3:00.4: xen_pciback: seizing device
    [   11.335120] pciback 0000:c3:00.4: xen_pciback: vpci: assign to virtual 
slot 0
    [   11.335544] pciback 0000:c3:00.4: registering for 1

3. Reading from BAR in domU (in SeaBIOS, and later Linux) returns
0xffffffff.
4. Does not crash the host.

Any ideas?

I don't have any other system with Zen4 to try on. The hw11 gitlab
runner is Ryzen 7 7735HS, and it doesn't have this issue. It's also
possible this is something related to Framework's firmware, but give all
the observations above, I find it less likely.

[1] https://docs.kernel.org/arch/x86/amd-debugging.html#random-reboot-issues
[2] https://github.com/coreboot/seabios/blob/master/src/hw/usb-xhci.c#L553
[3] 
https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/extensible-host-controler-interface-usb-xhci.pdf
 (page 385)
-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.