[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] pci-passthrough on dom0 kernel versions above 3.8 crashes dom0

Let me know if I can do anything to assist. 


On 09/10/2013, at 16.39.42, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:

On 04/10/13 11:05, Jan Beulich wrote:
On 04.10.13 at 09:44, Kristoffer Egefelt <kristoffer@xxxxxxx> wrote:

I'm trying to pass through a NIC (intel X520 with ixgbevf driver) to domU,
but since kernel 3.8 this has not worked.

The dom0 kernel seems to cause the problem.
Xen version, domU kernel version and driver version seems to be unrelated to
this bug, meaning
it works as long as dom0 kernel is 3.8.
I tried kernel version 3.9, 3.10 and 3.11 - all show the same bug pattern
when used as dom0.

The BUG appears on xl pci attach.
On pci detach the dom0 panics.

I have attached logs from a working setup (kernel 3.8) and from a setup not
working (kernel 3.11) and also the kernel config for 3.11.

In short, this is what domU logs after pci attach:

BUG: unable to handle kernel paging request at ffffc9000030200c
IP: [<ffffffff81205812>] __msix_mask_irq+0x21/0x24
PGD 75a40067 PUD 75a41067 PMD 75b44067 PTE 8010000000000464
Oops: 0003 [#1] SMP
Modules linked in: ixgbevf(+) xen_pcifront nfnetlink_log nfnetlink ipt_ULOG
x_tables x86_pkg_temp_thermal thermal_sys coretemp crc32c_intel
ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul
glue_helper microcode ext4 crc16 jbd2 mbcache xen_blkfront
CPU: 0 PID: 2122 Comm: modprobe Not tainted 3.11.3-kernel-v1.0.0.21+ #1

Are you certain this is kernel (rather than hypervisor) version
dependent? Iirc this is a manifestation of a guest kernel not being
permitted to write to the MSI-X mask bit.

And this is dom0 on pci detach:

(XEN) Assertion '_raw_spin_is_locked(lock)' failed at
(XEN) ----[ Xen-4.4-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82d0801258ef>] _spin_unlock_irqrestore+0x27/0x32
(XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor
(XEN) rax: 0000000000000001   rbx: ffff83201ba07724   rcx: 0000000000000001
(XEN) rdx: ffff83201bb97020   rsi: 0000000000000286   rdi: ffff83201ba07724
(XEN) rbp: ffff83203ffcfdd8   rsp: ffff83203ffcfdd8   r8:  ffff8141002000e0
(XEN) r9:  000000000000001c   r10: 0000000000000082   r11: 0000000000000001
(XEN) r12: 0000000000000000   r13: ffff8320e13c8240   r14: ffff880148047df4
(XEN) r15: 0000000000000286   cr0: 0000000080050033   cr4: 00000000000426f0
(XEN) cr3: 000000206f3ff000   cr2: 00007fa5ec560c49
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83203ffcfdd8:
(XEN)    ffff83203ffcfe68 ffff82d080166a4f ffff83203ffcfe18 0000000280118988
(XEN)    0000000000000cfe 0000000000000cfe ffff832015d3b8a0 ffff8320e13c83f0
(XEN)    ffff832015d3b880 0000000000000001 00000000fee00678 0000000000000000
(XEN)    ffff83200000f800 000000000000001b ffff8300bcef5000 ffffffffffffffed
(XEN)    ffff880148047df4 ffffffff814530e0 ffff83203ffcfef8 ffff82d08017dee4
(XEN)    ffff832000000002 0000000000000008 ffff83203ffcfef8 ffff82d000a0fb00
(XEN)    0000000000000000 ffffffff93010000 ffff82d0802e8000 ffff83203ffc80ef
(XEN)    82d080222c00b948 c390ef66d1ffffff ffff83203ffcfef8 ffff8300bcef5000
(XEN)    ffff880145951868 ffff880145bb2a60 ffff880148047f50 ffffffff814530e0
(XEN)    00007cdfc00300c7 ffff82d08022213b ffffffff8100142a 0000000000000021
(XEN)    ffffffff814530e0 ffff880148047f50 000000000000c002 0000000000009300
(XEN)    ffff88013faf1a80 ffff880145951000 0000000000000202 0000000000000093
(XEN)    ffff880148047df4 0000000000000002 0000000000000021 ffffffff8100142a
(XEN)    0000000000000000 ffff880148047df4 000000000000001b 0001010000000000
(XEN)    ffffffff8100142a 000000000000e033 0000000000000202 ffff880148047dc8
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000001 ffff8300bcef5000 0000004f9b885e00
(XEN)    0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d0801258ef>] _spin_unlock_irqrestore+0x27/0x32
(XEN)    [<ffff82d080166a4f>] pci_restore_msi_state+0x1c9/0x2f0
(XEN)    [<ffff82d08017dee4>] do_physdev_op+0xe4f/0x114f
(XEN)    [<ffff82d08022213b>] syscall_enter+0xeb/0x145
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion '_raw_spin_is_locked(lock)' failed at
(XEN) ****************************************
(XEN) Manual reset required ('noreboot' specified)

This, otoh, is clearly a hypervisor bug. Afaict the patch below
should help.

But - this code is supposed to be executed on host S3 resume only
(i.e. there might also be some kernel flaw involved here).

It's called from pci_restore_state() which is called from pciback when a
device is released.  This doesn't seem unreasonable to me.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.