[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [xen-unstable test] 11946: regressions - FAIL
On 07/05/2012 14:34, Jan Beulich wrote: >>>> On 07.05.12 at 13:50, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >> On 07/05/2012 09:10, Jan Beulich wrote: >>>>>> On 05.05.12 at 02:21, AP <apxeng@xxxxxxxxx> wrote: >>>> (XEN) *** IRQ BUG found *** >>>> (XEN) CPU0 -Testing vector 236 from bitmap >>> 236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming >>> in through the 8259A. Something fundamentally fishy must be going >>> on here, and I would suppose the code in question shouldn't even be >>> reached for legacy vectors. >>> >>> Furthermore, calling dump_irqs() from the debugging code with >>> desc->lock still held makes it impossible to get full output, as that >>> function wants to lock all initialized IRQ descriptors. >> Yes - it has been vector 236 on each of the 3 reported failures from AP, >> and I believe it was also vector 236 in the one case I managed to >> reproduce the issue. >> >> However, once we have set up the IO-APIC, the 8259A should not be used >> any more. The boot dmeg shows that io_ack_method is indeed "old" (which >> was going to be my first suggestion), and that EOI Broadcast Suppression >> is enabled, which I have already identified as a source of problems for >> some customers. As a 'fix', I provided the ability for >> "io_ack_method=new" to prevent EOI Broadcast Suppression being enabled. >> This was upstreamed in c/s 24870:9bf3ec036bef, but apparently has not >> completely fixed the customer problems - just made it substantially more >> rare. >> >> AP: Can you manually invoke the 'i' debug key and provide that - it will >> help to see how Xen is setting up the IO-APIC(s) on your system. > Seeing the 'z' output might also be helpful, especially to see whether > any of the IO-APICs' RTEs is an ExtINT one. > > Further, checking that no 8259A IRQ got (or was left) enabled for > some reason might be useful as well (cached_irq_mask plus the raw > port 0x21 and 0xA1 values). > > In any case the debugging code's locking should be fixed. > > Jan > It appears we have two functions to dump the IO-APIC state: __print_IO_APIC() which gets called on boot and from 'z', and dump_ioapic_irq_info() which gets called from the end of 'i'. These should probably be consolidated somehow. As for the debugging, perhaps change the call to dump_irqs() with a call to dump_ioapic_irq_info() instead. Given that the legacy vectors cant migrate, is it wise including them in the loop in irq_move_cleanup_interrupt()? In fact, is it wise including any vector above LAST_DYNAMIC_VECTOR? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |