[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 11946: regressions - FAIL



On 07/05/2012 14:34, Jan Beulich wrote:
>>>> On 07.05.12 at 13:50, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 07/05/2012 09:10, Jan Beulich wrote:
>>>>>> On 05.05.12 at 02:21, AP <apxeng@xxxxxxxxx> wrote:
>>>> (XEN) *** IRQ BUG found ***
>>>> (XEN) CPU0 -Testing vector 236 from bitmap
>>> 236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming
>>> in through the 8259A. Something fundamentally fishy must be going
>>> on here, and I would suppose the code in question shouldn't even be
>>> reached for legacy vectors.
>>>
>>> Furthermore, calling dump_irqs() from the debugging code with
>>> desc->lock still held makes it impossible to get full output, as that
>>> function wants to lock all initialized IRQ descriptors.
>> Yes - it has been vector 236 on each of the 3 reported failures from AP,
>> and I believe it was also vector 236 in the one case I managed to
>> reproduce the issue.
>>
>> However, once we have set up the IO-APIC, the 8259A should not be used
>> any more.  The boot dmeg shows that io_ack_method is indeed "old" (which
>> was going to be my first suggestion), and that EOI Broadcast Suppression
>> is enabled, which I have already identified as a source of problems for
>> some customers.  As a 'fix', I provided the ability for
>> "io_ack_method=new" to prevent EOI Broadcast Suppression being enabled. 
>> This was upstreamed in c/s 24870:9bf3ec036bef, but apparently has not
>> completely fixed the customer problems - just made it substantially more
>> rare.
>>
>> AP: Can you manually invoke the 'i' debug key and provide that - it will
>> help to see how Xen is setting up the IO-APIC(s) on your system.
> Seeing the 'z' output might also be helpful, especially to see whether
> any of the IO-APICs' RTEs is an ExtINT one.
>
> Further, checking that no 8259A IRQ got (or was left) enabled for
> some reason might be useful as well (cached_irq_mask plus the raw
> port 0x21 and 0xA1 values).
>
> In any case the debugging code's locking should be fixed.
>
> Jan
>

It appears we have two functions to dump the IO-APIC state:
__print_IO_APIC() which gets called on boot and from 'z', and
dump_ioapic_irq_info() which gets called from the end of 'i'.  These
should probably be consolidated somehow.

As for the debugging, perhaps change the call to dump_irqs() with a call
to dump_ioapic_irq_info() instead.

Given that the legacy vectors cant migrate, is it wise including them in
the loop in irq_move_cleanup_interrupt()?  In fact, is it wise including
any vector above LAST_DYNAMIC_VECTOR?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.