[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen optimization



On Wed, 17 Oct 2018, Milan Boberic wrote:
> Hi,
> >
> > The device tree with everything seems to be system.dts, that was enough
> > :-)  I don't need the dtsi files you used to build the final dts, I only
> > need the one you use in uboot and for your guest.
> 
>  I wasn't sure so I sent everything, sorry for being bombarded with
> all those files. :-)
> 
> > It looks like you set xen,passthrough correctly in system.dts for
> > timer@ff110000, serial@ff010000, and gpio@ff0a0000.
> 
> Thank you for taking a look, now we are sure that passthrough works
> correctly because there is no error during guest creation and there
> are no prints of "DEBUG irq slow path".

Great!


> > If you are not getting any errors anymore when creating your baremetal
> > guest, then yes, it should be working passthrough. I would double-check
> > that everything is working as expected using the DEBUG patch for Xen I
> > suggested to you in the other email. You might even want to remove the
> > "if" check and always print something for every interrupt of your guest
> > just to get an idea of what's going on. See the attached patch.
> 
> When I apply this patch it prints forever:
> (XEN) DEBUG virq=68 local=1
> which is a good thing I guess because interrupts are being generated non-stop.

Yes, local=1 means that the interrupt is injected to the local vcpu,
which is exactly what we want.


> > Once everything is as expected I would change the frequency of the
> > timer, because 1u is way too frequent. I think it should be at least
> > 3us, more like 5us.
> 
> Okay, about this... I double checked my bare-metal application and
> looks like interrupts weren't generated every 1 us. Maximum frequency
> of interrupts is 8 us. I checked interrupt frequency with oscilloscope
> just to be sure (toggling LED on/off when interrupts occur). So, when
> I set:
> - interrupts to be generated every 8 us I get jitter of 6 us
> - interrupts to be generated every 10 us I get jitter of 3 us (after
> 2-3mins it jumps to 6 us)
> - interrupts to be generated every 15 us jitter is the same as when
> only bare-metal application runs on board (without Xen or any OS)

These are very interesting numbers! Thanks again for running these
experiments. I don't want to jump to conclusions but they seem to verify
the theory that if the interrupt frequency is too high, we end up
spending too much time handling interrupts, the system cannot cope,
hence jitter increases.

However, I would have thought that the threshold should be lower than
15us, given that it takes 2.5us to inject an interrupt. I have a couple
of experiments suggestions below.


> I want to remind you that bare-metal application that only blinks LED
> with high speed gives 1 us jitter, somehow introducing frequent
> interrupts causes this jitter, that's why I was unsecure about this
> timer passthrough. Taking in consideration that you measured Xen
> overhead of 1 us I have a feeling that I'm missing something, is there
> anything else I could do to get better results except sched=null,
> vwfi=native, hard vCPU pinning (1 vCPU on 1 pCPU) and passthrough (not
> sure if it affects the jitter) ?
> I'm forcing frequent interrupts because I'm testing to see if this
> board with Xen on it could be used for real-time simulations,
> real-time signal processing, etc. If I could get results like yours (1
> us Xen overhead) of even better that would be great! BTW how did you
> measure Xen's overhead?

When I said overhead, I meant compared to Linux. The overall IRQ latency
with Xen on the Xilinx Zynq MPSoC is 2.5us. When I say "overall", I mean
from the moment the interrupt is generated to the point the interrupt
service routing is run in the baremetal guest.  I measure the overhead
using TBM (https://github.com/sstabellini/tbm phys-timer) and a modified
version of Xen that injects the generic physical timer interrupts to the
guest. I think you should be able to reproduce the same number using
the TTC timer like you are doing.

In addition to sched=null and vwfi=native, I also passed
serrors=panic. This last option further reduces context switch times and
should be safe on your board. You might want to add it, and run the
numbers again.


> > Keep in mind that jitter is about having
> > deterministic IRQ latency, not about having extremely frequent
> > interrupts.
> 
> Yes, but I want to see exactly where will I lose deterministic IRQ
> latency which is extremely important in real-time signal processing.
> So, what causes this jitter, are those Xen limits, ARM limits, etc? It
> would be nice to know, I'll share all the results I get.
> 
> > I would also double check that you are not using any other devices or
> > virtual interfaces in your baremetal app because that could negatively
> > affect the numbers.
> 
> I checked the bare-metal app and I think there is no other devices
> that bm app is using.

This should also be confirmed by the fact that you are only getting
"DEBUG virq=68 local=1" messages and nothing else. If other interrupts
were to be injected you should see other lines such as 

  DEBUG virq=27 local=1

I have an idea to verify this, see below,


> > Linux by default uses the virtual
> > timer interface ("arm,armv8-timer", I would double check that the
> > baremetal app is not doing the same -- you don't want to be using two
> > timers when doing your measurements.
> 
> Hmm, I'm not sure how to check that, I could send bare-metal app if
> that helps, it's created in Xilinx SDK 2017.4.
> Also, should I move to Xilinx SDK 2018.2 because I'm using PetaLinux 2018.2 ?
> I'm also using hardware description file for SDK that is created in
> Vivado 2017.4.
> Is all this could be a "not matching version" problem (I don't think
> so because bm app works)?
> 
> Meng mentioned in some of his earlier posts:
> 
> > Even though the app. is the only one running on the CPU, the CPU may
> > be used to handle other interrupts and its context (such as TLB and
> > cache) might be flushed by other components. When these happen, the
> > interrupt handling latency can vary a lot.
> 
> What do you think about this? I don't know how would I check this.

I think we want to fully understand how many other interrupts the
baremetal guest is receiving. To do that, we can modify my previous
patch to suppress any debug messages for virq=68. That way, we should
only see the other interrupts. Ideally there would be none.

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 5a4f082..b7a8e17 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -577,7 +577,11 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
 
     /* the irq is enabled */
     if ( test_bit(GIC_IRQ_GUEST_ENABLED, &n->status) )
+    {
         gic_raise_guest_irq(v, virq, priority);
+        if ( d->domain_id != 0 && virq != 68 )
+            printk("DEBUG virq=%d local=%d\n",virq,v == current);
+    }
 
     list_for_each_entry ( iter, &v->arch.vgic.inflight_irqs, inflight )
     {


Next step would be to verify that there are no other physical interrupts
interrupting the vcpu execution other the irq=68. We should be able to
check that with the following debug patch:


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e524ad5..b34c3e4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -381,6 +381,13 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
         /* Reading IRQ will ACK it */
         irq = gic_hw_ops->read_irq();
 
+        if (current->domain->domain_id > 0 && irq != 68)
+        {
+            local_irq_enable();
+            printk("DEBUG irq=%d\n",irq);
+            local_irq_disable();
+        }
+
         if ( likely(irq >= 16 && irq < 1020) )
         {
             local_irq_enable();

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.