Xen project Mailing List

Re: [Xen-devel] [Hackathon minutes] PV network improvements

To: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Date: Mon, 20 May 2013 15:49:32 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 20 May 2013 14:50:01 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, May 20, 2013 at 3:08 PM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote: > Hi all, > these are Konrad's and my notes (mostly Konrad's) on possible > improvements of the PV network protocol, taken at the Hackathon. > > > A) Network bandwidth: multipage rings > The max outstanding amount of data the it can have is 898kB (64K of > data use 18 slot, out of 256. 256 / 18 = 14, 14 * 64KB). This can be > expanded by having multi-page to expand the ring. This would benefit NFS > and bulk data transfer (such as netperf data). > > > B) Producer and consumer index is on the same cache line > In present hardware that means the reader and writer will compete for > the same cacheline causing a ping-pong between sockets. > This can be solved by having a feature-split-indexes (or better name) > where the req_prod and req_event as a tuple are different from the > rsp_prod and rsp_prod. This would entail using 128bytes of the ring at > the start - each cacheline for each tuple. > > > C) Cache alignment of requests > The fix is to make the request structures more cache-aligned. For > networking that means making it 16 bytes and block 64 bytes. > Since it does not shrink the structure but just expands it, could be > called feature-align-slot. > > > E) Multiqueue (request-feature-multiqueue) > It means creating many TX and RX rings for each vif. > > > F) don't gnt_copy all of the requests > Instead don't touch them and let the Xen IOMMU create appropriate > entries. This would require the DMA API in dom0 to be aware whether the > grant has been done and if not (so FOREIGN, aka no m2p_override), then > do the hypercall to tell the hypervisor that this grant is going to be > used by a specific PCI device. This would create the IOMMU entry in Xen. > > > G) On TX side, do persistent grant mapping > This would only be done from frontend -> backend path. That means that > we could exhaust initial domains memory. > > > H) Affinity of the frontend and backend being on the same NUMA node > This touches upon the discussion about NUMA and having PV guests be > aware of memory layout. It also means that each backend kthread needs to > be on a different NUMA node. > > > I) separate request and response rings for TX and RX > > > J) Map the whole physical memory of the machine in dom0 > If mapping/unmapping or copying slows us down, could we just keep the > whole physical memory of the machine mapped in dom0 (with corresponding > IOMMU entries)? > At that point the frontend could just pass mfn numbers to the backend, > and the backend would already have them mapped. > >From a security perspective it doesn't change anything when running > the backend in dom0, because dom0 is already capable of mapping random > pages of any guests. QEMU instances do that all the time. > But it would take away one of the benefits of deploying driver domains: > we wouldn't be able to run the backends at a lower privilege level. > However it might still be worth considering as an option? The backend is > still trusted and protected from the frontend, but the frontend wouldn't > be protected from the backend. What's missing from this was my side of the discussion: I was saying that if TLB flushes from grant-unmap is indeed the problem, then maybe we could have the *front-end* in charge of requesting a TLB flush for its pages. The strict TLB flushing is to protect a frontend from rogue back-ends from reading sensitive data; if the front-end were willing to just not use the pages for a short amount of time, and issue a flush say every second or so, that would reduce the TLB flushes greatly while maintaining the safety advantages of driver domains. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.