[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Interesting observation with network event notification and batching



On Fri, Jun 14, 2013 at 02:53:03PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 12, 2013 at 11:14:51AM +0100, Wei Liu wrote:
> > Hi all
> > 
> > I'm hacking on a netback trying to identify whether TLB flushes causes
> > heavy performance penalty on Tx path. The hack is quite nasty (you would
> > not want to know, trust me).
> > 
> > Basically what is doesn't is, 1) alter network protocol to pass along
> 
> You probably meant: "what it does" ?
> 

Oh yes! Muscle memory got me!

> > mfns instead of grant references, 2) when the backend sees a new mfn,
> > map it RO and cache it in its own address space.
> > 
> > With this hack, now we have some sort of zero-copy TX path. Backend
> > doesn't need to issue any grant copy / map operation any more. When it
> > sees a new packet in the ring, it just needs to pick up the pages
> > in its own address space and assemble packets with those pages then pass
> > the packet on to network stack.
> 
> Uh, so not sure I understand the RO part. If dom0 is mapping it won't
> that trigger a PTE update? And doesn't somebody (either the guest or
> initial domain) do a grant mapping to let the hypervisor know it is
> OK to map a grant?
> 

It is very easy to issue HYPERVISOR_mmu_udpate to alter Dom0's mapping,
because Dom0 is priveleged.

> Or is dom0 actually permitted to map the MFN of any guest without using
> the grants? In which case you are then using the _PAGE_IOMAP
> somewhere and setting up vmap entries with the MFN's that point to the
> foreign domain - I think?
> 

Sort of, but I didn't use vmap, I used alloc_page to get actual pages.
Then I modified the underlying PTE to point to the MFN from netfront.

> > 
> > In theory this should boost performance, but in practice it is the other
> > way around. This hack makes Xen network more than 50% slower than before
> > (OMG). Further investigation shows that with this hack the batching
> > ability is gone. Before this hack, netback batches like 64 slots in one
> 
> That is quite interesting.
> 
> > interrupt event, however after this hack, it only batches 3 slots in one
> > interrupt event -- that's no batching at all because we can expect one
> > packet to occupy 3 slots.
> 
> Right.
> > 
> > Time to have some figures (iperf from DomU to Dom0).
> > 
> > Before the hack, doing grant copy, throughput: 7.9 Gb/s, average slots
> > per batch 64.
> > 
> > After the hack, throughput: 2.5 Gb/s, average slots per batch 3.
> > 
> > After the hack, adds in 64 HYPERVISOR_xen_version (it just does context
> > switch into hypervisor) in Tx path, throughput: 3.2 Gb/s, average slots
> > per batch 6.
> > 
> > After the hack, adds in 256 HYPERVISOR_xen_version (it just does context
> > switch into hypervisor) in Tx path, throughput: 5.2 Gb/s, average slots
> > per batch 26.
> > 
> > After the hack, adds in 512 HYPERVISOR_xen_version (it just does context
> > switch into hypervisor) in Tx path, throughput: 7.9 Gb/s, average slots
> > per batch 26.
> > 
> > After the hack, adds in 768 HYPERVISOR_xen_version (it just does context
> > switch into hypervisor) in Tx path, throughput: 5.6 Gb/s, average slots
> > per batch 25.
> > 
> > After the hack, adds in 1024 HYPERVISOR_xen_version (it just does context
> > switch into hypervisor) in Tx path, throughput: 4.4 Gb/s, average slots
> > per batch 25.
> > 
> 
> How do you get it to do more HYPERVISR_xen_version? Did you just add
> a (for i = 1024; i>0;i--) hypervisor_yield();

 for (i = 0; i < X; i++) (void)HYPERVISOR_xen_version(0, NULL);

> 
> in netback?
> > Average slots per batch is calculate as followed:
> >  1. count total_slots processed from start of day
> >  2. count tx_count which is the number of tx_action function gets
> >     invoked
> >  3. avg_slots_per_tx = total_slots / tx_count
> > 
> > The counter-intuition figures imply that there is something wrong with
> > the currently batching mechanism. Probably we need to fine-tune the
> > batching behavior for network and play with event pointers in the ring
> > (actually I'm looking into it now). It would be good to have some input
> > on this.
> 
> I am still unsure I understand hwo your changes would incur more
> of the yields.

It's not yielding. At least that's not the purpose of that hypercall.
HYPERVISOR_xen_version(0, NULL) only does guest -> hypervisor -> guest
context switching. The original purpose of HYPERVISOR_xen_version(0,
NULL) is to force guest to check pending events.

Since you mentioned yeilding, I will also try to do yielding and post
figures.

> > 
> > Konrad, IIRC you once mentioned you discovered something with event
> > notification, what's that?
> 
> They were bizzare. I naively expected some form of # of physical NIC 
> interrupts to be around the same as the VIF or less. And I figured
> that the amount of interrupts would be constant irregardless of the
> size of the packets. In other words #packets == #interrupts.
> 

It could be that the frontend notifies the backend for every packet it
sends. This is not desirable and I don't expect the ring to behave that
way.

> In reality the number of interrupts the VIF had was about the same while
> for the NIC it would fluctuate. (I can't remember the details).
> 

I'm not sure I understand you here. But for the NIC, if you see the
number of interrupt goes from high to low that's expected. When the NIC
has very high interrupt rate it turns to polling mode.

> But it was odd and I didn't go deeper in it to figure out what
> was happening. And also to figure out if for the VIF we could
> do something of #packets != #interrupts.  And hopefully some
> mechanism to adjust so that the amount of interrupts would
> be lesser per packets (hand waving here).

I'm trying to do this now.


Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.