[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.14 and future work

  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Xen-devel List <xen-devel@xxxxxxxxxxxxx>
  • From: "Durrant, Paul" <pdurrant@xxxxxxxxxx>
  • Date: Tue, 3 Dec 2019 09:03:10 +0000
  • Accept-language: en-GB, en-US
  • Delivery-date: Tue, 03 Dec 2019 09:03:39 +0000
  • Ironport-sdr: E7oFPQ44mX0rL7/cigIVPIYnzthuzyOAM7IIsaIbUatRpRE7N4stJuMHm7/gWtq7I7LmL9G6io efFS+ghe8HSQ==
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHVqUqk5MCo/aOaTki8kF6hi2B7FKeoF5GA
  • Thread-topic: [Xen-devel] Xen 4.14 and future work

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of
> Andrew Cooper
> Sent: 02 December 2019 19:52
> To: Xen-devel List <xen-devel@xxxxxxxxxxxxx>
> Subject: [Xen-devel] Xen 4.14 and future work
> Hello,
> Now that 4.13 is on its way out of the door, it is time to look to
> ongoing work.
> We have a large backlog of speculation-related work.  For one, we still
> don't virtualise MSR_ARCH_CAPS for guests, or use eIBRS ourselves in
> Xen.  Therefore, while Xen does function on Cascade Lake, support is
> distinctly suboptimal.
> Similarly, AMD systems frequently fill /var/log with:
> (XEN) emul-priv-op.c:1113:d0v13 Domain attempted WRMSR c0011020 from
> 0x0006404000000000 to 0x0006404000000400
> which is an interaction Linux's prctl() to disable memory disambiguation
> on a per-process basis, Xen's write/discard behaviour for MSRs, and the
> long-overdue series to properly virtualise SSBD support on AMD
> hardware.  AMD Rome hardware, like Cascade Lake, has certain hardware
> speculative mitigation features which need virtualising for guests to
> make use of.

I assume this would addressed by the proposed cpuid/msr policy work? I think it 
is quite vital for Xen that we are able to migrate guests across pools of 
heterogeneous h/w and therefore I'd like to see this done in 4.14 if possible.

> Similarly, there is plenty more work to do with core-aware scheduling,
> and from my side of things, sane guest topology.  This will eventually
> unblock one of the factors on the hard 128 vcpu limit for HVM guests.
> Another big area is the stability of toolstack hypercalls.  This is a
> crippling pain point for distros and upgradeability of systems, and
> there is frankly no justifiable reason for the way we currently do
> things  The real reason is inertia from back in the days when Xen.git
> (bitkeeper as it was back then) contained a fork of every relevant
> pieces of software, but this a long-since obsolete model, but still
> causing us pain.  I will follow up with a proposal in due course, but as
> a oneliner, it will build on the dm_op() API model.

This is also fairly vital for the work on live update of Xen (as discussed at 
the last dev summit). Any instability in the tools ABI will compromise 
hypervisor update and fixing such issues on an ad-hoc basis as they arise is 
not really a desirable prospect.

> Likely included within this is making the domain/vcpu destroy paths
> idempotent so we can fix a load of NULL pointer dereferences in Xen
> caused by XEN_DOMCTL_max_vcpus not being part of XEN_DOMCTL_createdomain.
> Other work in this area involves adding X86_EMUL_{VIRIDIAN,NESTED_VIRT}
> to replace their existing problematic enablement interfaces.

I think this should include deprecation of HVMOP_get/set_param as far as is 
possible (i.e. tools use)...

> A start needs to be made on a total rethink of the HVM ABI.  This has
> come up repeatedly at previous dev summits, and is in desperate need of
> having some work started on it.

...and completely in any new ABI.

I wonder to what extent we can provide a guest-side compat layer here, 
otherwise it would be hard to get traction I think.
There was an interesting talk at KVM Forum (https://sched.co/Tmuy) on dealing 
with emulation inside guest context by essentially re-injecting the VMEXITs 
back into the guest for pseudo-SMM code (loaded as part of the firmware blob) 
to deal with. I could imagine potentially using such a mechanism to have a 
'legacy' hypercall translated to the new ABI, which would allow older guests to 
be supported unmodified (albeit with a performance penalty). Such a mechanism 
may also be useful as an alternative way of dealing with some of the emulation 
dealt with directly in Xen at the moment, to reduce the hypervisor attack 
surface e.g. stdvga caching, hpet, rtc... perhaps.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.