[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Hackathon minutes] PV network improvements

On Tue, 21 May 2013, Konrad Rzeszutek Wilk wrote:
> On Tue, May 21, 2013 at 11:51:03AM +0100, Stefano Stabellini wrote:
> > On Tue, 21 May 2013, Tim Deegan wrote:
> > > At 19:31 +0100 on 20 May (1369078279), Wei Liu wrote:
> > > > On Mon, May 20, 2013 at 03:08:05PM +0100, Stefano Stabellini wrote:
> > > > > J) Map the whole physical memory of the machine in dom0
> > > > > If mapping/unmapping or copying slows us down, could we just keep the
> > > > > whole physical memory of the machine mapped in dom0 (with 
> > > > > corresponding
> > > > > IOMMU entries)?
> > > > > At that point the frontend could just pass mfn numbers to the backend,
> > > > > and the backend would already have them mapped.
> > > > > >From a security perspective it doesn't change anything when running
> > > > > the backend in dom0, because dom0 is already capable of mapping random
> > > > > pages of any guests. QEMU instances do that all the time.
> > > > > But it would take away one of the benefits of deploying driver 
> > > > > domains:
> > > > > we wouldn't be able to run the backends at a lower privilege level.
> > > > > However it might still be worth considering as an option? The backend 
> > > > > is
> > > > > still trusted and protected from the frontend, but the frontend 
> > > > > wouldn't
> > > > > be protected from the backend.
> > > > > 
> > > > 
> > > > I think Dom0 mapping all machine memory is a good starting point.
> > > 
> > > I _strongly_ disagree.  The opportunity for disaggregation and reduction
> > > of privilege in backends is probably Xen's biggest techical advantage
> > > and we should not be taking any backward steps there.
> > 
> > While I agree with you, as a matter of fact the vast majority of Xen
> > installations today do not use driver domains. That didn't stop them
> > from enjoying Xen so far. Moreover the frontend/backend interface
> > remains narrow and difficult to exploit, it's not a fully emulated
> > interface (AHCI / virtio). The backend is still protected from the
> > frontend. Having the backend running non-privileged is a great bonus
> > and certainly required on a product that allows the user to install
> > third party driver domains. However if the driver domains are "trusted"
> > then I think they can also be trusted with a full memory map. After all
> > it has been the case for all XenServer, OVM and SLES releases so far
> > AFAIK.
> > 
> > An hypothetic future Xen release could offer both increased security
> > (driver domains) or increased IO performances (backends with a full
> > physical memory map) and give the user a choice between the two. I am
> > pretty sure that a non-negligible amount of people would make the
> > conscious choice to go for the performance option.
> > Why should we be the ones to force security down their throats?
> > After all it's all about what the users want from the project.
> > 
> > Obviously in an ideal world we would be able to offer both at the same
> > time, and maybe George's proposal is exactly what is going to achieve
> > that. But I was describing the case that requires us to make a choice.
> CC-ing Mukesh here as driver domains have some relevance to PVH work.
> Please also CC Malcolm here (I don't have his email).
> I would say that perhaps a better option is to do both - as in retain
> the security architecture Xen has _and_ also provide increased IO performance.

Of course that is the best option.

However I think that we should know exactly what would be the level of
performances if we had all the memory mapped in the backend domain all
the time. It would be very useful to understand what we need to
optimize.  It might turn out that the difference is not that much, and
we need to optimize something else. Or it might turn out that the
difference is huge even after all the optimizations you listed below.

> Concurently everybody is also looking at both backend and frontend having a
> persistent pool of grants. This means we do setup an "window" from either
> backend -> frontend or vice-versa that persists. Said "window" is bolted
> for the life-time of the guest. For networking the kernel stack already
> copies the pages from the user-space in the kernel and copying
> in the kernel to specific pages is mostly using the CPU cache. We need to
> exploit that and also make sure that the path is not interrupted.
> The grant_mapping on the TX side also looks a nice path - just have to
> make sure that the networking API don't try to free the page once the TX
> has been done (and this is where Ian's skb deconstructor would be beneficial).
> For block it is a bit different as aio's are mapped from kernel to
> user-space. But the neat thing there is that there is no need to inspect
> the data - when giving it to the DMA device (the exception is DIF/DIX which
> need calculate checksums). That is unless one needs to do the
> xen_biovec_phys_mergeable (to check if the next page is contingous and
> if so add new bio's and copy the data in).
> But with PVH and PVHVM driver domains, and also piggybacking on the work
> that Malcolm is doing (Xen IOMMU), we can skip that check. (As the PFNs
> for the guest would look contingous).
> In essence we can do a lot:
>  1). not copying or mapping grants if we detect that they are going to
>      a DMA device.
>  2). The 1) above + also use the Xen IOMMU to take care of setting the
>      proper EPT entries for the pages that we need. This could be done
>      as part of a grant_copy or grant_light_mapping in the hypervisor. This is
>      a case were we MUST copy those pages in the other domain (say the
>      Ethernet header). Whether a copy is done or a light mapping
>      (b/c the moment the device does the DMA operation on the granted
>      page we might as well remove the mapping. Hence the "light" or
>      maybe "expiring" grant.
>  3). The 2) above + Intel QuickData (a DMA engine that uses the same
>      L3 cache that PCI devices use) to keep the copied pages in the L3.
>      This has the benefit that when the PCI device is instructed to
>      fetch the data, it would do it from the L3 cache and be incredibly quick.
>      This would be using the grant_copy, but instead of the hypervisor
>      doing it, it instructs the Intel QuickData chipset to do it. Would
>      require some form of asynchronous grant_copy mechanism.
>  4). Variants of the above.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.