[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics



On 03/10/11 19:13, Stefano Stabellini wrote:
> CC'ing Jan, that probably is going to have an opinion on this.
>
> Let me add a bit of background: Stefan found out that PV on HVM guests
> could loose level interrupts coming from emulated devices. Looking
> through the code I realized that we need to add some logic to inject a
> pirq in the guest if a level interrupt has been raised while the guest
> is servicing the first one.
> While this is all very specific to interrupt remapping and emulated
> devices, I realized that something similar could happen even with dom0
> or other PV guests with PCI passthrough:
>
> 1) the device raises a level interrupt and xen injects it into the
> guest;
>
> 2) the guest is temporarely stuck: it does not ack it or eoi it;
>
> 3) the xen timer kicks in and eois the interrupt;
>
> 4) the device thinks it is all fine and sends a second interrupt;
>
> 5) Xen fails to inject the second interrupt into the guest because the
> guest has still the event channel pending bit set;
>
> at this point the guest looses the second interrupt notification, that
> is not supposed to happen with level interrupts and I think it might
> cause problems with some devices.
>
> Jan, do you think we should try to handle this case, or is it too
> unlikely?

I am not certain whether this is relevant, but the ICH10 IO-APIC
documentation indicated that early EOI'ing of a line level interrupt
should not have this effect.  Specifically, it states that EOI'ing a
line level interrupt whos line is still asserted will cause the
interrupt to be "re-raised" from the IO-APIC.  It uses this to assert
that it is fine to use multiple IO-APIC entries with the same vector,
with a broadcast of vector number alone to EOI the interrupt.

In this case, while Xen sees two interrupts, from the devices point of
view, only I has happened.

In the case where the device has dropped its line level interrupt of its
own accord, then I would agree that the current Xen behavior is wrong. 
I cant offhand think of a good reason why this would occur.

I know it is not helpful in this case, but as a rule of thumb, line
level interrupts should not be used with Xen.  The average response time
on an unloaded system is ~30ms, ranging from 5 to 150.  (On a sample set
of a Dell R710, Xen 4.1.0 and 2.6.32 dom0, over 2 weeks of debugging
another line level interrupt bug).

~Andrew

> In any case we need to handle the PV on HVM remapping bug, that because
> of the way interrupts are emulated is much more likely to happen...
>
>
> On Mon, 3 Oct 2011, Stefano Stabellini wrote:
>> On Fri, 30 Sep 2011, Stefan Bader wrote:
>>> Also I did not completely remove the section that would return the status
>>> without setting needsEOI. I just changed the if condition to be <0 instead 
>>> of
>>> <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 
>>> check
>>> could be useful for something.
>>>
>>>         irq_status_query.flags = 0;
>>>         if ( is_hvm_domain(v->domain) &&
>>>              domain_pirq_to_irq(v->domain, irq) < 0 )
>>>         {
>>>             ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
>>>             break;
>>>         }
>>>
>> You need to remove the entire test because we want to receive
>> notifications in all cases.
>>
>>
>>> With that a quick test shows both the re-sends done sometimes and the domU 
>>> doing
>>> EOIs. And there is no stall apparent. Did the same quick test with the e1000
>>> emulated NIC and that still seems ok. Those were not very thorough tests 
>>> but at
>>> least I would have observed a stall pretty quick otherwise.
>> I am glad it fixes the problem for you.
>>
>> I am going to send a different patch upstream for Xen 4.2, because I
>> would also like it to cover the very unlikely scenario in which a PV
>> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
>> interrupts because when Xen tries to set the corresponding event channel
>> pending the bit is alreay set. The codebase is different enough that
>> making the same change on 4.1 is non-trivial. I am appending the new
>> patch to this email, it would be great if you could test it. You just
>> need a 4.2 hypervisor, not the entire system. You should be able to
>> perform the test updating only xen.gz.
>> If you have trouble if xen-unstable.hg tip, try changeset 23843.
>>
>> ---
>>
>>
>> diff -r bf533533046c xen/arch/x86/hvm/irq.c
>> --- a/xen/arch/x86/hvm/irq.c Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/hvm/irq.c Mon Oct 03 16:54:51 2011 +0000
>> @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
>>  
>>      if ( hvm_domain_use_pirq(d, pirq) )
>>      {
>> -        send_guest_pirq(d, pirq);
>> +        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
>> +            pirq->lost++;
>>          return;
>>      }
>>      vioapic_irq_positive_edge(d, ioapic_gsi);
>> @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
>>  {
>>      struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
>>      unsigned int gsi, link, isa_irq;
>> +    struct pirq *pirq;
>>  
>>      ASSERT((device <= 31) && (intx <= 3));
>>  
>> @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
>>      gsi = hvm_pci_intx_gsi(device, intx);
>>      if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
>>          assert_gsi(d, gsi);
>> +    else {
>> +        pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
>> +        if ( hvm_domain_use_pirq(d, pirq) )
>> +            pirq->lost++;
>> +    }
>>  
>>      link    = hvm_pci_intx_link(device, intx);
>>      isa_irq = hvm_irq->pci_link.route[link];
>> diff -r bf533533046c xen/arch/x86/irq.c
>> --- a/xen/arch/x86/irq.c     Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/irq.c     Mon Oct 03 16:54:51 2011 +0000
>> @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
>>               !test_and_set_bool(pirq->masked) )
>>              action->in_flight++;
>>          if ( !hvm_do_IRQ_dpci(d, pirq) )
>> -            send_guest_pirq(d, pirq);
>> +        {
>> +            if ( send_guest_pirq(d, pirq) &&
>> +                    action->ack_type == ACKTYPE_EOI )
>> +                pirq->lost++;
>> +        }
>>      }
>>  
>>      if ( action->ack_type != ACKTYPE_NONE )
>> diff -r bf533533046c xen/arch/x86/physdev.c
>> --- a/xen/arch/x86/physdev.c Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/physdev.c Mon Oct 03 16:54:51 2011 +0000
>> @@ -11,6 +11,7 @@
>>  #include <asm/current.h>
>>  #include <asm/io_apic.h>
>>  #include <asm/msi.h>
>> +#include <asm/hvm/irq.h>
>>  #include <asm/hypercall.h>
>>  #include <public/xen.h>
>>  #include <public/physdev.h>
>> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>>          if ( !is_hvm_domain(v->domain) ||
>>               domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>>              pirq_guest_eoi(pirq);
>> +        if ( pirq->lost > 0) {
>> +            if ( !send_guest_pirq(v->domain, pirq) )
>> +                pirq->lost--;
>> +        }
>>          spin_unlock(&v->domain->event_lock);
>>          ret = 0;
>>          break;
>> @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>>              break;
>>          irq_status_query.flags = 0;
>>          if ( is_hvm_domain(v->domain) &&
>> -             domain_pirq_to_irq(v->domain, irq) <= 0 )
>> +             domain_pirq_to_irq(v->domain, irq) <= 0 &&
>> +             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
>>          {
>> -            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
>> +            ret = -EINVAL;
>>              break;
>>          }
>>  
>> diff -r bf533533046c xen/include/xen/irq.h
>> --- a/xen/include/xen/irq.h  Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/include/xen/irq.h  Mon Oct 03 16:54:51 2011 +0000
>> @@ -146,6 +146,7 @@ struct pirq {
>>      int pirq;
>>      u16 evtchn;
>>      bool_t masked;
>> +    u32 lost;
>>      struct rcu_head rcu_head;
>>      struct arch_pirq arch;
>>  };
>>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.