[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics



CC'ing Jan, that probably is going to have an opinion on this.

Let me add a bit of background: Stefan found out that PV on HVM guests
could loose level interrupts coming from emulated devices. Looking
through the code I realized that we need to add some logic to inject a
pirq in the guest if a level interrupt has been raised while the guest
is servicing the first one.
While this is all very specific to interrupt remapping and emulated
devices, I realized that something similar could happen even with dom0
or other PV guests with PCI passthrough:

1) the device raises a level interrupt and xen injects it into the
guest;

2) the guest is temporarely stuck: it does not ack it or eoi it;

3) the xen timer kicks in and eois the interrupt;

4) the device thinks it is all fine and sends a second interrupt;

5) Xen fails to inject the second interrupt into the guest because the
guest has still the event channel pending bit set;

at this point the guest looses the second interrupt notification, that
is not supposed to happen with level interrupts and I think it might
cause problems with some devices.

Jan, do you think we should try to handle this case, or is it too
unlikely?

In any case we need to handle the PV on HVM remapping bug, that because
of the way interrupts are emulated is much more likely to happen...


On Mon, 3 Oct 2011, Stefano Stabellini wrote:
> On Fri, 30 Sep 2011, Stefan Bader wrote:
> > Also I did not completely remove the section that would return the status
> > without setting needsEOI. I just changed the if condition to be <0 instead 
> > of
> > <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 
> > check
> > could be useful for something.
> > 
> >         irq_status_query.flags = 0;
> >         if ( is_hvm_domain(v->domain) &&
> >              domain_pirq_to_irq(v->domain, irq) < 0 )
> >         {
> >             ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
> >             break;
> >         }
> > 
> 
> You need to remove the entire test because we want to receive
> notifications in all cases.
> 
> 
> > With that a quick test shows both the re-sends done sometimes and the domU 
> > doing
> > EOIs. And there is no stall apparent. Did the same quick test with the e1000
> > emulated NIC and that still seems ok. Those were not very thorough tests 
> > but at
> > least I would have observed a stall pretty quick otherwise.
> 
> I am glad it fixes the problem for you.
> 
> I am going to send a different patch upstream for Xen 4.2, because I
> would also like it to cover the very unlikely scenario in which a PV
> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
> interrupts because when Xen tries to set the corresponding event channel
> pending the bit is alreay set. The codebase is different enough that
> making the same change on 4.1 is non-trivial. I am appending the new
> patch to this email, it would be great if you could test it. You just
> need a 4.2 hypervisor, not the entire system. You should be able to
> perform the test updating only xen.gz.
> If you have trouble if xen-unstable.hg tip, try changeset 23843.
> 
> ---
> 
> 
> diff -r bf533533046c xen/arch/x86/hvm/irq.c
> --- a/xen/arch/x86/hvm/irq.c  Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/hvm/irq.c  Mon Oct 03 16:54:51 2011 +0000
> @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
>  
>      if ( hvm_domain_use_pirq(d, pirq) )
>      {
> -        send_guest_pirq(d, pirq);
> +        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
> +            pirq->lost++;
>          return;
>      }
>      vioapic_irq_positive_edge(d, ioapic_gsi);
> @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
>  {
>      struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
>      unsigned int gsi, link, isa_irq;
> +    struct pirq *pirq;
>  
>      ASSERT((device <= 31) && (intx <= 3));
>  
> @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
>      gsi = hvm_pci_intx_gsi(device, intx);
>      if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
>          assert_gsi(d, gsi);
> +    else {
> +        pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
> +        if ( hvm_domain_use_pirq(d, pirq) )
> +            pirq->lost++;
> +    }
>  
>      link    = hvm_pci_intx_link(device, intx);
>      isa_irq = hvm_irq->pci_link.route[link];
> diff -r bf533533046c xen/arch/x86/irq.c
> --- a/xen/arch/x86/irq.c      Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/irq.c      Mon Oct 03 16:54:51 2011 +0000
> @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
>               !test_and_set_bool(pirq->masked) )
>              action->in_flight++;
>          if ( !hvm_do_IRQ_dpci(d, pirq) )
> -            send_guest_pirq(d, pirq);
> +        {
> +            if ( send_guest_pirq(d, pirq) &&
> +                    action->ack_type == ACKTYPE_EOI )
> +                pirq->lost++;
> +        }
>      }
>  
>      if ( action->ack_type != ACKTYPE_NONE )
> diff -r bf533533046c xen/arch/x86/physdev.c
> --- a/xen/arch/x86/physdev.c  Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/physdev.c  Mon Oct 03 16:54:51 2011 +0000
> @@ -11,6 +11,7 @@
>  #include <asm/current.h>
>  #include <asm/io_apic.h>
>  #include <asm/msi.h>
> +#include <asm/hvm/irq.h>
>  #include <asm/hypercall.h>
>  #include <public/xen.h>
>  #include <public/physdev.h>
> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>          if ( !is_hvm_domain(v->domain) ||
>               domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>              pirq_guest_eoi(pirq);
> +        if ( pirq->lost > 0) {
> +            if ( !send_guest_pirq(v->domain, pirq) )
> +                pirq->lost--;
> +        }
>          spin_unlock(&v->domain->event_lock);
>          ret = 0;
>          break;
> @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>              break;
>          irq_status_query.flags = 0;
>          if ( is_hvm_domain(v->domain) &&
> -             domain_pirq_to_irq(v->domain, irq) <= 0 )
> +             domain_pirq_to_irq(v->domain, irq) <= 0 &&
> +             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
>          {
> -            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
> +            ret = -EINVAL;
>              break;
>          }
>  
> diff -r bf533533046c xen/include/xen/irq.h
> --- a/xen/include/xen/irq.h   Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/include/xen/irq.h   Mon Oct 03 16:54:51 2011 +0000
> @@ -146,6 +146,7 @@ struct pirq {
>      int pirq;
>      u16 evtchn;
>      bool_t masked;
> +    u32 lost;
>      struct rcu_head rcu_head;
>      struct arch_pirq arch;
>  };
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.