On 04/08/2010 10:37 AM, Konrad Rzeszutek Wilk wrote:
>> Yes,
>>
>> Please e-mail your full serial log output, your cat /proc/interrupts,
>> and 'lspci -vvv' output. This is to say, for both Dom0 and DomU.
>>
> I think I am able to reproduce this with one device (in DomU) that shares the
> IRQ
> (17) with another device that is in Dom0. In Dom0 I get:
>
For the "nobody cared" message to trigger, then there must either have
been no interrupt handlers at all, or they all returned IRQ_NONE.
So in theory, if irq 17 has an active driver on it, then its irq handler
should see the interrupt, poke the device, go "huh, nothing for me to
do, must be a spurious interrupt from something else sharing the irq",
and I guess return IRQ_NONE.
So what stops this? If the irq isn't being shared with anything in
dom0, we should be careful not even map the interrupt into dom0 (though
I suspect we only ever map, never unmap, interrupts).
But if the interrupt is being shared, I think we need a proxy interrupt
handler installed by pciback (pcistub?)to absorb apparently spurious
interrupts, which always returns IRQ_HANDLED (and perhaps have some of
its own screaming interrupt logic in case something has gone awry)?
Or if not that, what? How has this problem been avoided before?
> -sh-3.1#
> -sh-3.1# [ 2349.534294] irq 17: nobody cared (try booting with the
> "irqpoll" option)
> [ 2349.534477] Pid: 0, comm: swapper Not tainted 2Trace:
> [ 2349.534728] <IRQ> [<ffffffff810ea3c7>] __report_bad_irq+0x54/0xe2
> [ 2349.534887] [<ffffffff810ea6a2>] note_interrupt+0x24d/0x2b8
> [ 2349.535019] [<ffffffff810eb95d>] handle_level_irq+0xef/0x17b
> [ 2349.535151] [<ffffffff81370257>] xen_evtchn_do_upcall+0x156/0x254
> [ 2349.535282] [<ffffffff81017e7e>]
> xen_do_hypervisor_callback+0x1e/0x30
> [ 2349.535282] <EOI> [<ffffffff810093aa>] ?
> hypercall_page+0x3aa/0x1000
> [ 2349.535282] [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000
> [ 2349.535282] [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1000
> [ 2349.535282] [<ffffffff81010f3b>] ? xen_safe_halt+0x1e/0x3d
> [ 2349.535282] [<ffffffff8100cb87>] ? xen_idle+0x10b/0x130
> [ 2349.535282] [<ffffffff81015887>] ? cpu_idle+0x167/0x1d5
> [ 2349.535282] [<ffffffff816641e1>] ? rest_init+0xb5/0xbe
> [ 2349.535282] [<ffffffff81a447a5>] ? start_kernel+0x777/0x78a
> [ 2349.535282] [<ffffffff81a43326>] ?
> x86_64_start_reservations+0x111/0x11c
> [ 2349.535282] [<ffffffff81a48e25>] ? xen_start_kernel+0x678/0x686
> [ 2349.535282] handlers:
> [ 2349.535282] [<ffffffffa00c2420>] (lpfc_sli_intr_handler+0x0/0x22a
> [lpfc])
> [ 2349.535282] [<ffffffffa00240a5>] (tg3_interrupt_tagged+0x0/0xe6
> [tg3])
> [ 2349.535282] Disabling IRQ #17
> [ 2382.845061] lpfc 0000:05:04.0: 0:0459 Adapter heartbeat failure,
> taking this port offline.
> [ 2397.052375] device-mapper: multipath: Failing path 8:0.
> [ 2397.053041] ata3: lost interrupt (Status 0x50)
> [ 2397.053275] [ 2398.054372] device-mapper: multipath: Failing path
> 8:16.
> [ 2398.055179] ata4: lost interrupt (Status 0x50)
> [ 2398.055413][ 2447.701115] ata3: lost interrupt (Status 0x50)
> [ 2447.701389] sd 2:0:0:0: [sda] Unhandled error code
> [ 2447.701515] sd 2:0
>
>
> .. and it also kills the ata_piix controller which is not on the same
> IRQ (??)
>
That's very strange, but I suspect there's a lot of mysterious magic
around piix ide controller interrupts relating to backwards compat, etc.
J
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|