[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Hackathon minutes] PV network improvements

On Mon, May 20, 2013 at 3:08 PM, Stefano Stabellini
<stefano.stabellini@xxxxxxxxxxxxx> wrote:
> Hi all,
> these are Konrad's and my notes (mostly Konrad's) on possible
> improvements of the PV network protocol, taken at the Hackathon.
> A) Network bandwidth: multipage rings
> The max outstanding amount of data the it can have is 898kB (64K of
> data use 18 slot, out of 256. 256 / 18 = 14, 14 * 64KB).  This can be
> expanded by having multi-page to expand the ring. This would benefit NFS
> and bulk data transfer (such as netperf data).
> B) Producer and consumer index is on the same cache line
> In present hardware that means the reader and writer will compete for
> the same cacheline causing a ping-pong between sockets.
> This can be solved by having a feature-split-indexes (or better name)
> where the req_prod and req_event as a tuple are different from the
> rsp_prod and rsp_prod. This would entail using 128bytes of the ring at
> the start - each cacheline for each tuple.
> C)  Cache alignment of requests
> The fix is to make the request structures more cache-aligned. For
> networking that means making it 16 bytes and block 64 bytes.
> Since it does not shrink the structure but just expands it, could be
> called feature-align-slot.
> E) Multiqueue (request-feature-multiqueue)
> It means creating many TX and RX rings for each vif.
> F) don't gnt_copy all of the requests
> Instead don't touch them and let the Xen IOMMU create appropriate
> entries. This would require the DMA API in dom0 to be aware whether the
> grant has been done and if not (so FOREIGN, aka no m2p_override), then
> do the hypercall to tell the hypervisor that this grant is going to be
> used by a specific PCI device. This would create the IOMMU entry in Xen.
> G) On TX side, do persistent grant mapping
> This would only be done from frontend -> backend path.  That means that
> we could exhaust initial domains memory.
> H) Affinity of the frontend and backend being on the same NUMA node
> This touches upon the discussion about NUMA and having PV guests be
> aware of memory layout. It also means that each backend kthread needs to
> be on a different NUMA node.
> I) separate request and response rings for TX and RX
> J) Map the whole physical memory of the machine in dom0
> If mapping/unmapping or copying slows us down, could we just keep the
> whole physical memory of the machine mapped in dom0 (with corresponding
> IOMMU entries)?
> At that point the frontend could just pass mfn numbers to the backend,
> and the backend would already have them mapped.
> >From a security perspective it doesn't change anything when running
> the backend in dom0, because dom0 is already capable of mapping random
> pages of any guests. QEMU instances do that all the time.
> But it would take away one of the benefits of deploying driver domains:
> we wouldn't be able to run the backends at a lower privilege level.
> However it might still be worth considering as an option? The backend is
> still trusted and protected from the frontend, but the frontend wouldn't
> be protected from the backend.

What's missing from this was my side of the discussion:

I was saying that if TLB flushes from grant-unmap is indeed the
problem, then maybe we could have the *front-end* in charge of
requesting a TLB flush for its pages.  The strict TLB flushing is to
protect a frontend from rogue back-ends from reading sensitive data;
if the front-end were willing to just not use the pages for a short
amount of time, and issue a flush say every second or so, that would
reduce the TLB flushes greatly while maintaining the safety advantages
of driver domains.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.