[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq



Tuesday, November 18, 2014, 9:56:33 PM, you wrote:

>> 
>> Uhmm i thought i had these switched off (due to problems earlier and then 
>> forgot 
>> about them .. however looking at the earlier reports these lines were also 
>> in 
>> those reports).
>> 
>> The xen-syms and these last runs are all with a prestine xen tree cloned 
>> today (staging 
>> branch), so the qemu-xen and seabios defined with that were also freshly 
>> cloned 
>> and had a new default seabios config. (just to rule out anything stale in my 
>> tree)
>> 
>> If you don't see those messages .. perhaps your seabios and qemu trees (and 
>> at least the 
>> seabios config) are not the most recent (they don't get updated 
>> automatically 
>> when you just do a git pull on the main tree) ?
>> 
>> In /tools/firmware/seabios-dir/.config i have:
>> CONFIG_USB=y
>> CONFIG_USB_UHCI=y
>> CONFIG_USB_OHCI=y
>> CONFIG_USB_EHCI=y
>> CONFIG_USB_XHCI=y
>> CONFIG_USB_MSC=y
>> CONFIG_USB_UAS=y
>> CONFIG_USB_HUB=y
>> CONFIG_USB_KEYBOARD=y
>> CONFIG_USB_MOUSE=y
>> 

> I seem to have the same thing. Perhaps it is my XHCI controller being wonky.

>> And this is all just from a:
>> - git clone git://xenbits.xen.org/xen.git -b staging
>> - make clean && ./configure && make -j6 && make -j6 install

> Aye. 
> .. snip..
>> >  1) test_and_[set|clear]_bit sometimes return unexpected values.
>> >     [But this might be invalid as the addition of the ffff8303faaf25a8
>> >      might be correct - as the second dpci the softirq is processing
>> >      could be the MSI one]
>> 
>> Would there be an easy way to stress test this function separately in some 
>> debugging function to see if it indeed is returning unexpected values ?

> Sadly no. But you got me looking in the right direction when you mentioned
> 'timeout'.
>> 
>> >  2) INIT_LIST_HEAD operations on the same CPU are not honored.
>> 
>> Just curious, have you also tested the patches on AMD hardware ?

> Yes. To reproduce this the first thing I did was to get an AMD box.

>> 
>>  
>> >> When i look at the combination of (2) and (3), It seems it could be an 
>> >> interaction between the two passed through devices and/or different IRQ 
>> >> types.
>> 
>> > Could be - as in it is causing this issue to show up faster than
>> > expected. Or it is the one that triggers more than one dpci happening
>> > at the same time.
>> 
>> Well that didn't seem to be it (see separate amendment i mailed previously)

> Right, the current theory I've is that the interrupts are not being
> Acked within 8 milisecond and we reset the 'state' - and at the same
> time we get an interrupt and schedule it - while we are still processing
> the same interrupt. This would explain why the 'test_and_clear_bit'
> got the wrong value.

> In regards to the list poison - following this thread of logic - with
> the 'state = 0' set we open the floodgates for any CPU to put the same
> 'struct hvm_pirq_dpci' on its list.

> We do reset the 'state' on _every_ GSI that is mapped to a guest - so
> we also reset the 'state' for the MSI one (XHCI). Anyhow in your case:

> CPUX:                           CPUY:
> pt_irq_time_out:
> state = 0;                      
> [out of timer coder, the                raise_softirq
>  pirq_dpci is on the dpci_list]         [adds the pirq_dpci as state == 0]

> softirq_dpci                            softirq_dpci:
>         list_del
>         [entries poison]
>                                                 list_del <= BOOM
>                         
> Is what I believe is happening.

> The INTX device - once I put a load on it - does not trigger
> any pt_irq_time_out, so that would explain why I cannot hit this.

> But I believe your card hits these "hiccups".   


Hi Konrad,

I just tested you 5 patches and as a result i still got an(other) host crash:
(complete serial log attached)

(XEN) [2014-11-18 21:55:41.591] ----[ Xen-4.5.0-rc  x86_64  debug=y  Not 
tainted ]----
(XEN) [2014-11-18 21:55:41.591] CPU:    0
(XEN) [2014-11-18 21:55:41.591] ----[ Xen-4.5.0-rc  x86_64  debug=y  Not 
tainted ]----
(XEN) [2014-11-18 21:55:41.591] RIP:    e008:[<ffff82d08012c7e7>]CPU:    2
(XEN) [2014-11-18 21:55:41.591] RIP:    e008:[<ffff82d08014a461>] 
hvm_do_IRQ_dpci+0xbd/0x13c
(XEN) [2014-11-18 21:55:41.591] RFLAGS: 0000000000010006    
_spin_unlock+0x1f/0x30CONTEXT: hypervisor
(XEN) [2014-11-18 21:55:41.591] 
(XEN) [2014-11-18 21:55:41.591] RFLAGS: 0000000000010246   rax: 
0000000000000000   rbx: ffff8303773450a8   rcx: 0000000000000001
(XEN) [2014-11-18 21:55:41.591] CONTEXT: hypervisor
(XEN) [2014-11-18 21:55:41.591] rdx: 0000000000000000   rsi: ffff83054ef4ef98   
rdi: 0000000012aa5400
(XEN) [2014-11-18 21:55:41.591] rax: ffff82d080328da0   rbx: ffff8305186c5d80   
rcx: 0000000000000000
(XEN) [2014-11-18 21:55:41.591] rbp: ffff83054ef47c88   rsp: ffff83054ef47c78   
r8:  ffff8305186c58d0
(XEN) [2014-11-18 21:55:41.591] r9:  000000000000002f   r10: 00000000000000d0   
r11: ffffffff829084b0
(XEN) [2014-11-18 21:55:41.591] rdx: ffff82d0802e0000   rsi: ffff83050aead2a8   
rdi: 00000000000000b8
(XEN) [2014-11-18 21:55:41.591] rbp: ffff82d0802e7df8   rsp: ffff82d0802e7df8   
r8:  ffff82d0802e7d28
(XEN) [2014-11-18 21:55:41.591] r9:  0000000000000040   r10: 0000000000000000   
r11: ffffffffffffffc0
(XEN) [2014-11-18 21:55:41.591] r12: ffff8305186c5d80   r13: ffff8303773450a8   
r14: ffff8303773450b8
(XEN) [2014-11-18 21:55:41.591] r15: ffff8305186c5b00   cr0: 000000008005003b   
cr4: 00000000000006f0
(XEN) [2014-11-18 21:55:41.591] r12: ffff830515b5b000   r13: 0000000000000000   
r14: ffff830377345080
(XEN) [2014-11-18 21:55:41.591] cr3: 000000054a215000   cr2: 00000000000000b8
(XEN) [2014-11-18 21:55:41.591] r15: 000000000000002f   cr0: 000000008005003b   
cr4: 00000000000006f0
(XEN) [2014-11-18 21:55:41.591] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 
0000   cs: e008
(XEN) [2014-11-18 21:55:41.591] cr3: 000000054a215000   cr2: 0000000000000160
(XEN) [2014-11-18 21:55:41.591] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 
0000   cs: e008
(XEN) [2014-11-18 21:55:41.591] Xen stack trace from rsp=ffff82d0802e7df8:
(XEN) [2014-11-18 21:55:41.591]    ffff82d0802e7e48Xen stack trace from 
rsp=ffff83054ef47c78:
(XEN) [2014-11-18 21:55:41.591]    ffff82d08014a395 ffff83009fd2d060 
ffff83054ef47c88 ffff8303773450b8
(XEN) [2014-11-18 21:55:41.591]    ffffc900141f2b20 ffff82d080328f80 
ffff830377345140 ffff82d08014a26e ffff8303773450a8 ffff83054ef47d18
(XEN) [2014-11-18 21:55:41.591]    ffff82d080172060 000000943f43e518 
ffff88002b227e18 ffff82d0802e7e78
(XEN) [2014-11-18 21:55:41.591]    ffff82d08012f2c3 0000000000000286
(XEN) [2014-11-18 21:55:41.591]    0000000100000031 ffff82d08018b20f 
ffff82d080328f80 ffff83050b0bb5e0 ffff83054ef47cf8 ffff82d080178846 
ffff8303773450e0
(XEN) [2014-11-18 21:55:41.591]   
(XEN) [2014-11-18 21:55:41.591]    ffff82d0802e7ec8 ffff82d08012f3c3 
ffff82d0802e7ef8 0000000000000000 ffff82d08022d5a1
(XEN) [2014-11-18 21:55:41.591]    000000943f65d8b4 ffff83055d002f24 
0000000000000000 0000002f9ff88000
(XEN) [2014-11-18 21:55:41.591]    ffff82d0802fff80 ffff83054ef47d28 
000000000055d126 ffff83054ef12000 ffff82d0802fff80 ffffffffffffffff
(XEN) [2014-11-18 21:55:41.591]    ffff830515b5b0b8 ffff82d0802e0000 
ffff88002b227e18
(XEN) [2014-11-18 21:55:41.591]    ffff82d0802e7ef8 ffff82d0802fff80 
ffff82d08012be31 ffff8303773450a8 ffff830515b5b000
(XEN) [2014-11-18 21:55:41.591]    0200200200200200 ffff83009fd2d000
(XEN) [2014-11-18 21:55:41.591]    00007cfab10b82b7 0000000000000001 
ffff82d080233122 0200200200200200 ffff830515b5b000
(XEN) [2014-11-18 21:55:41.591]    0000000000000001 ffff88005925a1e8
(XEN) [2014-11-18 21:55:41.591]    ffff8303773450a8 ffff82d0802fff80 
ffff82d0802e7f08 ffff82d08012be89 ffff83054ef47dd8 00007d2f7fd180c7 
ffff830515b5b0b8 ffff82d080232cd1
(XEN) [2014-11-18 21:55:41.591]    ffff88002b227e18 ffff88005925a1e8
(XEN) [2014-11-18 21:55:41.591]    0000000000000001 0000000000000001
(XEN) [2014-11-18 21:55:41.591]    ffff88002b227bb8 ffffffff829084b0 
ffff88005f6d35a8 0000000000000000 0000000000000000
(XEN) [2014-11-18 21:55:41.591]    00000000000000d0 000000943f4e172d 
0000000000000000 ffff830377345150 0000000000005776 ffffffff81c10cc0
(XEN) [2014-11-18 21:55:41.591]    000000943f43e300 0000000000000000
(XEN) [2014-11-18 21:55:41.591]    0000000000000001 0000000000000001 
0000000000000000 ffff83054ef4ef98
(XEN) [2014-11-18 21:55:41.591]    ffff830515b5b0bc 000000b900000000 
ffff82d08012c69f ffff88005925a180 ffff88005f6d3500 000000000000e008 
000000fa00000000
(XEN) [2014-11-18 21:55:41.591]    0000000000000246
(XEN) [2014-11-18 21:55:41.591]    ffffffff810eab63 ffff83054ef47dd0 
000000000000e033 0000000000000000 0000000000000286 ffff830377345110 
ffff88002b227b68
(XEN) [2014-11-18 21:55:41.591]   
(XEN) [2014-11-18 21:55:41.591]    000000000000e02b ffff83054ef47ec8 
ffff82d08014962d 000000000000beef 0000000000000100 ffff82d080328da0
(XEN) [2014-11-18 21:55:41.591]    000000000000beef 000000000000beef
(XEN) [2014-11-18 21:55:41.591]    000000000000beef 0000000000000000 
ffff83009fd2d000 ffff830512b6c068 0000000000000000 ffff83054ef4e540
(XEN) [2014-11-18 21:55:41.591]    ffff83054ef4e400 0000000000000000
(XEN) [2014-11-18 21:55:41.591] Xen call trace:
(XEN) [2014-11-18 21:55:41.591]    [<ffff82d08012c7e7>] _spin_unlock+0x1f/0x30
(XEN) [2014-11-18 21:55:41.591]  ffff830515b5b0b8
(XEN) [2014-11-18 21:55:41.591]    0000000100000000 ffff83054ef47e88   
[<ffff82d08014a395>] pt_irq_time_out+0x127/0x136
(XEN) [2014-11-18 21:55:41.591]    [<ffff82d08012f2c3>] execute_timer+0x4e/0x6c
(XEN) [2014-11-18 21:55:41.591]    [<ffff82d08012f3c3>] 
timer_softirq_action+0xe2/0x220
(XEN) [2014-11-18 21:55:41.591]    [<ffff82d08012be31>] __do_softirq+0x81/0x8c
(XEN) [2014-11-18 21:55:41.591]    [<ffff82d08012be89>] do_softirq+0x13/0x15
(XEN) [2014-11-18 21:55:41.591]    [<ffff82d080232cd1>] 
process_softirqs+0x21/0x30
(XEN) [2014-11-18 21:55:41.591] 
(XEN) [2014-11-18 21:55:41.591]  ffff83054ef47e88 ffff83054ef47e88Pagetable 
walk from 00000000000000b8:
(XEN) [2014-11-18 21:55:41.591] 
(XEN) [2014-11-18 21:55:41.591]    ffff8303773450a8 L4[0x000] = 
0000000000000000 ffffffffffffffff
(XEN) [2014-11-18 21:55:41.591]  0000000000000082 ffff8303773450a8
(XEN) [2014-11-18 21:55:43.260] ****************************************
(XEN) [2014-11-18 21:55:43.280]  ffff830377345150Panic on CPU 0:
(XEN) [2014-11-18 21:55:43.297] FATAL PAGE FAULT
(XEN) [2014-11-18 21:55:43.310] [error_code=0000]
(XEN) [2014-11-18 21:55:43.323] Faulting linear address: 00000000000000b8
(XEN) [2014-11-18 21:55:43.343] ****************************************
(XEN) [2014-11-18 21:55:43.362] 
(XEN) [2014-11-18 21:55:43.371] Reboot in five seconds...
(XEN) [2014-11-18 21:55:43.386] 
(XEN) [2014-11-18 21:55:43.395]    ffff830515b5b000 0000000000000001 
ffff830377345080 000000000000002f
(XEN) [2014-11-18 21:55:43.422]    ffff83054ef47f08 ffff82d0801721a3 
ffff83054ef47e88 ffff83054ef47e88
(XEN) [2014-11-18 21:55:43.449]    00000ecc00000004 ffff82d080300080 
ffff82d0802fff80 ffffffffffffffff
(XEN) [2014-11-18 21:55:43.476]    ffff83054ef40000 0000000000000001 
ffff83054ef47ef8 ffff82d08012be31
(XEN) [2014-11-18 21:55:43.503]    ffff83009ff88000 ffffffff83081590 
ffffffff8221c520 ffffffff8221cc20
(XEN) [2014-11-18 21:55:43.530] Xen call trace:
(XEN) [2014-11-18 21:55:43.543]    [<ffff82d08014a461>] 
hvm_do_IRQ_dpci+0xbd/0x13c
(XEN) [2014-11-18 21:55:43.565]    [<ffff82d080172060>] do_IRQ+0x49c/0x624
(XEN) [2014-11-18 21:55:43.584]    [<ffff82d080233122>] 
common_interrupt+0x62/0x70
(XEN) [2014-11-18 21:55:43.606]    [<ffff82d08012c69f>] _spin_lock+0x1a/0x54
(XEN) [2014-11-18 21:55:43.626]    [<ffff82d08014962d>] dpci_softirq+0x241/0x3ad
(XEN) [2014-11-18 21:55:43.648]    [<ffff82d08012be31>] __do_softirq+0x81/0x8c
(XEN) [2014-11-18 21:55:43.669]    [<ffff82d08012be89>] do_softirq+0x13/0x15
(XEN) [2014-11-18 21:55:43.689]    [<ffff82d080232cd1>] 
process_softirqs+0x21/0x30
(XEN) [2014-11-18 21:55:43.711] 
(XEN) [2014-11-18 21:55:43.720] Pagetable walk from 0000000000000160:
(XEN) [2014-11-18 21:55:43.738]  L4[0x000] = 0000000000000000 ffffffffffffffff
(XEN) [2014-11-18 21:55:43.759] 
(XEN) [2014-11-18 21:55:43.768] ****************************************
(XEN) [2014-11-18 21:55:43.787] Panic on CPU 2:
(XEN) [2014-11-18 21:55:43.800] FATAL PAGE FAULT
(XEN) [2014-11-18 21:55:43.813] [error_code=0002]
(XEN) [2014-11-18 21:55:43.826] Faulting linear address: 0000000000000160
(XEN) [2014-11-18 21:55:43.845] ****************************************
(XEN) [2014-11-18 21:55:43.865] 
(XEN) [2014-11-18 21:55:43.873] Reboot in five seconds...

# addr2line -e xen-syms ffff82d08012c7e7
/usr/src/new/xen-unstable-vanilla/xen/include/asm/spinlock.h:18
# addr2line -e xen-syms ffff82d08014a461
/usr/src/new/xen-unstable-vanilla/xen/include/asm/atomic.h:172
# addr2line -e xen-syms ffff82d080172060
/usr/src/new/xen-unstable-vanilla/xen/arch/x86/irq.c:1175
# addr2line -e xen-syms ffff82d080233122
/usr/src/new/xen-unstable-vanilla/xen/arch/x86/x86_64/entry.S:487
# addr2line -e xen-syms ffff82d08012c69f
/usr/src/new/xen-unstable-vanilla/xen/common/spinlock.c:126
# addr2line -e xen-syms ffff82d08014962d
/usr/src/new/xen-unstable-vanilla/xen/drivers/passthrough/io.c:835
# addr2line -e xen-syms ffff82d08014a395
/usr/src/new/xen-unstable-vanilla/xen/drivers/passthrough/io.c:339
# addr2line -e xen-syms ffff82d08012f2c3
/usr/src/new/xen-unstable-vanilla/xen/common/timer.c:426

Attachment: serial.log
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.