On 03/10/11 19:13, Stefano Stabellini wrote:
> CC'ing Jan, that probably is going to have an opinion on this.
>
> Let me add a bit of background: Stefan found out that PV on HVM guests
> could loose level interrupts coming from emulated devices. Looking
> through the code I realized that we need to add some logic to inject a
> pirq in the guest if a level interrupt has been raised while the guest
> is servicing the first one.
> While this is all very specific to interrupt remapping and emulated
> devices, I realized that something similar could happen even with dom0
> or other PV guests with PCI passthrough:
>
> 1) the device raises a level interrupt and xen injects it into the
> guest;
>
> 2) the guest is temporarely stuck: it does not ack it or eoi it;
>
> 3) the xen timer kicks in and eois the interrupt;
>
> 4) the device thinks it is all fine and sends a second interrupt;
>
> 5) Xen fails to inject the second interrupt into the guest because the
> guest has still the event channel pending bit set;
>
> at this point the guest looses the second interrupt notification, that
> is not supposed to happen with level interrupts and I think it might
> cause problems with some devices.
>
> Jan, do you think we should try to handle this case, or is it too
> unlikely?
I am not certain whether this is relevant, but the ICH10 IO-APIC
documentation indicated that early EOI'ing of a line level interrupt
should not have this effect. Specifically, it states that EOI'ing a
line level interrupt whos line is still asserted will cause the
interrupt to be "re-raised" from the IO-APIC. It uses this to assert
that it is fine to use multiple IO-APIC entries with the same vector,
with a broadcast of vector number alone to EOI the interrupt.
In this case, while Xen sees two interrupts, from the devices point of
view, only I has happened.
In the case where the device has dropped its line level interrupt of its
own accord, then I would agree that the current Xen behavior is wrong.
I cant offhand think of a good reason why this would occur.
I know it is not helpful in this case, but as a rule of thumb, line
level interrupts should not be used with Xen. The average response time
on an unloaded system is ~30ms, ranging from 5 to 150. (On a sample set
of a Dell R710, Xen 4.1.0 and 2.6.32 dom0, over 2 weeks of debugging
another line level interrupt bug).
~Andrew
> In any case we need to handle the PV on HVM remapping bug, that because
> of the way interrupts are emulated is much more likely to happen...
>
>
> On Mon, 3 Oct 2011, Stefano Stabellini wrote:
>> On Fri, 30 Sep 2011, Stefan Bader wrote:
>>> Also I did not completely remove the section that would return the status
>>> without setting needsEOI. I just changed the if condition to be <0 instead
>>> of
>>> <=0 (I knew from the tests that the mapping was always 0 and maybe the <0
>>> check
>>> could be useful for something.
>>>
>>> irq_status_query.flags = 0;
>>> if ( is_hvm_domain(v->domain) &&
>>> domain_pirq_to_irq(v->domain, irq) < 0 )
>>> {
>>> ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
>>> break;
>>> }
>>>
>> You need to remove the entire test because we want to receive
>> notifications in all cases.
>>
>>
>>> With that a quick test shows both the re-sends done sometimes and the domU
>>> doing
>>> EOIs. And there is no stall apparent. Did the same quick test with the e1000
>>> emulated NIC and that still seems ok. Those were not very thorough tests
>>> but at
>>> least I would have observed a stall pretty quick otherwise.
>> I am glad it fixes the problem for you.
>>
>> I am going to send a different patch upstream for Xen 4.2, because I
>> would also like it to cover the very unlikely scenario in which a PV
>> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
>> interrupts because when Xen tries to set the corresponding event channel
>> pending the bit is alreay set. The codebase is different enough that
>> making the same change on 4.1 is non-trivial. I am appending the new
>> patch to this email, it would be great if you could test it. You just
>> need a 4.2 hypervisor, not the entire system. You should be able to
>> perform the test updating only xen.gz.
>> If you have trouble if xen-unstable.hg tip, try changeset 23843.
>>
>> ---
>>
>>
>> diff -r bf533533046c xen/arch/x86/hvm/irq.c
>> --- a/xen/arch/x86/hvm/irq.c Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/hvm/irq.c Mon Oct 03 16:54:51 2011 +0000
>> @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
>>
>> if ( hvm_domain_use_pirq(d, pirq) )
>> {
>> - send_guest_pirq(d, pirq);
>> + if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
>> + pirq->lost++;
>> return;
>> }
>> vioapic_irq_positive_edge(d, ioapic_gsi);
>> @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
>> {
>> struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
>> unsigned int gsi, link, isa_irq;
>> + struct pirq *pirq;
>>
>> ASSERT((device <= 31) && (intx <= 3));
>>
>> @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
>> gsi = hvm_pci_intx_gsi(device, intx);
>> if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
>> assert_gsi(d, gsi);
>> + else {
>> + pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
>> + if ( hvm_domain_use_pirq(d, pirq) )
>> + pirq->lost++;
>> + }
>>
>> link = hvm_pci_intx_link(device, intx);
>> isa_irq = hvm_irq->pci_link.route[link];
>> diff -r bf533533046c xen/arch/x86/irq.c
>> --- a/xen/arch/x86/irq.c Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/irq.c Mon Oct 03 16:54:51 2011 +0000
>> @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
>> !test_and_set_bool(pirq->masked) )
>> action->in_flight++;
>> if ( !hvm_do_IRQ_dpci(d, pirq) )
>> - send_guest_pirq(d, pirq);
>> + {
>> + if ( send_guest_pirq(d, pirq) &&
>> + action->ack_type == ACKTYPE_EOI )
>> + pirq->lost++;
>> + }
>> }
>>
>> if ( action->ack_type != ACKTYPE_NONE )
>> diff -r bf533533046c xen/arch/x86/physdev.c
>> --- a/xen/arch/x86/physdev.c Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/physdev.c Mon Oct 03 16:54:51 2011 +0000
>> @@ -11,6 +11,7 @@
>> #include <asm/current.h>
>> #include <asm/io_apic.h>
>> #include <asm/msi.h>
>> +#include <asm/hvm/irq.h>
>> #include <asm/hypercall.h>
>> #include <public/xen.h>
>> #include <public/physdev.h>
>> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>> if ( !is_hvm_domain(v->domain) ||
>> domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>> pirq_guest_eoi(pirq);
>> + if ( pirq->lost > 0) {
>> + if ( !send_guest_pirq(v->domain, pirq) )
>> + pirq->lost--;
>> + }
>> spin_unlock(&v->domain->event_lock);
>> ret = 0;
>> break;
>> @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>> break;
>> irq_status_query.flags = 0;
>> if ( is_hvm_domain(v->domain) &&
>> - domain_pirq_to_irq(v->domain, irq) <= 0 )
>> + domain_pirq_to_irq(v->domain, irq) <= 0 &&
>> + domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
>> {
>> - ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
>> + ret = -EINVAL;
>> break;
>> }
>>
>> diff -r bf533533046c xen/include/xen/irq.h
>> --- a/xen/include/xen/irq.h Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/include/xen/irq.h Mon Oct 03 16:54:51 2011 +0000
>> @@ -146,6 +146,7 @@ struct pirq {
>> int pirq;
>> u16 evtchn;
>> bool_t masked;
>> + u32 lost;
>> struct rcu_head rcu_head;
>> struct arch_pirq arch;
>> };
>>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|