[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Questions about GPLPV stability tests



On Wed, Dec 07, 2011 at 08:44:19PM +0000, Roderick Colenbrander wrote:
> On Wed, Dec 7, 2011 at 8:38 PM, Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx> 
> wrote:
> >> It took about a week, but the systems went down again. Linux is down,
> >> but the hypervisor is still reachable on the serial console. Is there
> >> anything interesting to dump from there?
> >>
> > Just the interrupts information. I think that is Ctrl-A, Ctrl-A,
> > Ctrl-A, and then '*' to capture everything (I don't remember the right
> > one for just interrupts).
> 
> Here's the interrupt information:
> (XEN) [2011-12-06 19:13:37] [i: dump interrupt bindings]
> (XEN) [2011-12-06 19:13:37] Guest interrupt information:
> (XEN) [2011-12-06 19:13:37]    IRQ:   0
> affinity:00000000,00000000,00000000,00000001 vec:f0 type=IO-APIC-edge
>   status=00000000 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:   1
> affinity:00000000,00000000,00000000,00000001 vec:30 type=IO-APIC-edge
>   status=00000014 in-flight=0 domain-list=0:  1(-S--),
> (XEN) [2011-12-06 19:13:37]    IRQ:   2
> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2 type=XT-PIC
>   status=00000000 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:   3
> affinity:00000000,00000000,00000000,00000001 vec:38 type=IO-APIC-edge
>   status=00000006 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:   4
> affinity:00000000,00000000,00000000,00000001 vec:40 type=IO-APIC-edge
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:   5
> affinity:00000000,00000000,00000000,00000001 vec:f1 type=IO-APIC-edge
>   status=00000000 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:   6
> affinity:00000000,00000000,00000000,00000001 vec:48 type=IO-APIC-edge
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:   7
> affinity:00000000,00000000,00000000,00000001 vec:50 type=IO-APIC-edge
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:   8
> affinity:00000000,00000000,00000000,00000001 vec:58 type=IO-APIC-edge
>   status=00000010 in-flight=0 domain-list=0:  8(-S--),
> (XEN) [2011-12-06 19:13:37]    IRQ:   9
> affinity:00000000,00000000,00000000,00000001 vec:60 type=IO-APIC-level
>   status=00000010 in-flight=0 domain-list=0:  9(-S--),
> (XEN) [2011-12-06 19:13:37]    IRQ:  10
> affinity:00000000,00000000,00000000,00000001 vec:68 type=IO-APIC-edge
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:  11
> affinity:00000000,00000000,00000000,00000001 vec:70 type=IO-APIC-edge
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:  12
> affinity:00000000,00000000,00000000,00000001 vec:78 type=IO-APIC-edge
>   status=00000010 in-flight=0 domain-list=0: 12(-S--),
> (XEN) [2011-12-06 19:13:37]    IRQ:  13
> affinity:00000000,00000000,00000000,00000001 vec:88 type=IO-APIC-edge
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:  14
> affinity:00000000,00000000,00000000,00000001 vec:90 type=IO-APIC-edge
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:  15
> affinity:00000000,00000000,00000000,00000001 vec:98 type=IO-APIC-edge
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:37]    IRQ:  16
> affinity:00000000,00000000,00000000,00000001 vec:c8 type=IO-APIC-level
>   status=00000010 in-flight=0 domain-list=0: 16(-S--),1201: 16(--M-),
> (XEN) [2011-12-06 19:13:37]    IRQ:  18
> affinity:00000000,00000000,00000000,00000001 vec:51 type=IO-APIC-level
>   status=00000010 in-flight=0 domain-list=0: 18(-S--),
> (XEN) [2011-12-06 19:13:37]    IRQ:  19
> affinity:00000000,00000000,00000000,00000001 vec:d0 type=IO-APIC-level
>   status=00000010 in-flight=0 domain-list=0: 19(-S--),
> (XEN) [2011-12-06 19:13:37]    IRQ:  21
> affinity:00000000,00000000,00000000,00000001 vec:61 type=IO-APIC-level
>   status=00000010 in-flight=0 domain-list=0: 21(-S--),
> (XEN) [2011-12-06 19:13:37]    IRQ:  23
> affinity:00000000,00000000,00000000,00000001 vec:59 type=IO-APIC-level
>   status=00000010 in-flight=1 domain-list=0: 23(PS-M),
> (XEN) [2011-12-06 19:13:37]    IRQ:  24
> affinity:00000000,00000000,00000000,00000001 vec:28 type=DMA_MSI
>   status=00000000 mapped, unbound
> (XEN) [2011-12-06 19:13:38]    IRQ:  25
> affinity:00000000,00000000,00000000,00000001 vec:a0 type=PCI-MSI
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38]    IRQ:  26
> affinity:00000000,00000000,00000000,00000001 vec:a8 type=PCI-MSI
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38]    IRQ:  27
> affinity:00000000,00000000,00000000,00000001 vec:b0 type=PCI-MSI
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38]    IRQ:  28
> affinity:00000000,00000000,00000000,00000001 vec:b8 type=PCI-MSI
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38]    IRQ:  29
> affinity:00000000,00000000,00000000,00000001 vec:c0 type=PCI-MSI
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38]    IRQ:  30
> affinity:00000000,00000000,00000000,00000001 vec:d8 type=PCI-MSI
>   status=00000014 in-flight=0 domain-list=0:274(PS--),
> (XEN) [2011-12-06 19:13:38]    IRQ:  31
> affinity:00000000,00000000,00000000,00000001 vec:21 type=PCI-MSI
>   status=00000010 in-flight=0 domain-list=0:273(PS--),
> (XEN) [2011-12-06 19:13:38]    IRQ:  32
> affinity:00000000,00000000,00000000,00000001 vec:29 type=PCI-MSI
>   status=00000010 in-flight=0 domain-list=0:272(PS--),
> (XEN) [2011-12-06 19:13:38]    IRQ:  33
> affinity:00000000,00000000,00000000,00000001 vec:31 type=PCI-MSI
>   status=00000010 in-flight=0 domain-list=0:271(PS--),
> (XEN) [2011-12-06 19:13:38]    IRQ:  34
> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:39 type=PCI-MSI
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38]    IRQ:  35
> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:41 type=PCI-MSI
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38]    IRQ:  36
> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:49 type=PCI-MSI
>   status=00000002 mapped, unbound
> (XEN) [2011-12-06 19:13:38]    IRQ:  37
> affinity:00000000,00000000,00000000,00000010 vec:65 type=PCI-MSI
>   status=00000010 in-flight=0 domain-list=1201: 55(--M-),
> (XEN) [2011-12-06 19:13:38] IO-APIC interrupt information:
> (XEN) [2011-12-06 19:13:38]     IRQ  0 Vec240:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin  2: vector=240,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ  1 Vec 48:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin  1: vector=48,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ  3 Vec 56:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin  3: vector=56,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ  4 Vec 64:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin  4: vector=64,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ  5 Vec241:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin  5: vector=241,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ  6 Vec 72:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin  6: vector=72,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ  7 Vec 80:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin  7: vector=80,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ  8 Vec 88:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin  8: vector=88,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ  9 Vec 96:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin  9: vector=96,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=level, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 10 Vec104:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 10: vector=104,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 11 Vec112:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 11: vector=112,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 12 Vec120:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 12: vector=120,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 13 Vec136:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 13: vector=136,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 14 Vec144:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 14: vector=144,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 15 Vec152:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 15: vector=152,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0,
> irr=0, trigger=edge, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 16 Vec200:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 16: vector=200,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1,
> irr=0, trigger=level, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 18 Vec 81:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 18: vector=81,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1,
> irr=0, trigger=level, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 19 Vec208:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 19: vector=208,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1,
> irr=0, trigger=level, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 21 Vec 97:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 21: vector=97,
> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1,
> irr=0, trigger=level, mask=0, dest_id:1
> (XEN) [2011-12-06 19:13:38]     IRQ 23 Vec 89:
> (XEN) [2011-12-06 19:13:38]       Apic 0x00, Pin 23: vector=89,
> delivery_mode=1, dest_mode=logical, delivery_status=1, polarity=1,
> irr=1, trigger=level, mask=0, dest_id:1
> 
> I noticed some potential interesting things. The system in question is
> using PCI passthrough. On the serial console I remember seeing that
> the PCI device got unmapped from DomU and got mapped again in Dom0.
> The serial console log still had a lot of information about this DomU
> which I guess was going down. I guess it wasn't fully down yet.

One of the debug informations that gets printed with '*' is the guest
PCI passthrough data. Such as which IRQ, BAR, etc are mapped in. Did any
of those exist?

My original thought of what is going on is that the interrupts either
stopped completly working (does not look like - it looks like the
interrupts are firring and the event channels that are bound to it have
the bits set saying - hey I am pending). Oddly there are bunch of MSI
ones that are masked which usually means they are disabled.

Then there is the 
 affinity:00000000,00000000,00000000,00000001 vec:c8 type=IO-APIC-level
   status=00000010 in-flight=0 domain-list=0: 16(-S--),1201: 16(--M-),
 affinity:00000000,00000000,00000000,00000010 vec:65 type=PCI-MSI
   status=00000010 in-flight=0 domain-list=1201: 55(--M-),

The guest has masked both interrupts - so it is off//dying, but somehow
the unmapping has not happend. In other words, what you just analyzed
and found out.

Sadly, there is no smoking gun..

So a couple of things that I would do is to ensure that the Xen
hypervisor boots with 'sync_console console_to_ring' and when this
crash happens see if I can get a stack trace from both the Xen
hypervisor (there are some debug parameters to get that - I think it is
'r'?), and also from the Linux kernel.

But also see if the problem disappears with the latest 4.1.x hypervisor.
And it might also be worth upgrading the dom0 to a 3.0. Hmm, it would be
very interesting to see where the dom0 _is_ stuck at. The hypervisor is
running fine so it all points to dom0 crashing somewhere..

Can you make sure that dom0 runs with 'debug loglevel=8' as well. That
should give some wealth of information when/if a crash happens.
Oh, and try to pass in Ctrl-Alt-SysRQ-t (on serial concentrators I think
you just need to type 'break' on the telnet line)and then t. But I am
not entirely sure about it - might want to double check Google and see
how to enable Alt-SysRQ.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.