Xen project Mailing List

RE: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo, ...

From: Paul Durrant <xadimgnik@xxxxxxxxx>

Date: Thu, 3 Dec 2020 17:07:38 -0000

Cc: "'Paul Durrant'" <pdurrant@xxxxxxxxxx>, "'Eslam Elnikety'" <elnikety@xxxxxxxxxx>, "'Ian Jackson'" <iwj@xxxxxxxxxxxxxx>, "'Wei Liu'" <wl@xxxxxxx>, "'Anthony PERARD'" <anthony.perard@xxxxxxxxxx>, "'Andrew Cooper'" <andrew.cooper3@xxxxxxxxxx>, "'George Dunlap'" <george.dunlap@xxxxxxxxxx>, "'Julien Grall'" <julien@xxxxxxx>, "'Stefano Stabellini'" <sstabellini@xxxxxxxxxx>, "'Christian Lindig'" <christian.lindig@xxxxxxxxxx>, "'David Scott'" <dave@xxxxxxxxxx>, "'Volodymyr Babchuk'" <Volodymyr_Babchuk@xxxxxxxx>, 'Roger Pau Monné' <roger.pau@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 03 Dec 2020 17:07:44 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQJzrCHAX/gquRkO4g65wfjgQBytUAJ7xs67Aj6rtzQCSqguWAI8ixj/qGHFtuA=

> -----Original Message----- > From: Jan Beulich <jbeulich@xxxxxxxx> > Sent: 03 December 2020 15:57 > To: paul@xxxxxxx > Cc: 'Paul Durrant' <pdurrant@xxxxxxxxxx>; 'Eslam Elnikety' > <elnikety@xxxxxxxxxx>; 'Ian Jackson' > <iwj@xxxxxxxxxxxxxx>; 'Wei Liu' <wl@xxxxxxx>; 'Anthony PERARD' > <anthony.perard@xxxxxxxxxx>; 'Andrew > Cooper' <andrew.cooper3@xxxxxxxxxx>; 'George Dunlap' > <george.dunlap@xxxxxxxxxx>; 'Julien Grall' > <julien@xxxxxxx>; 'Stefano Stabellini' <sstabellini@xxxxxxxxxx>; 'Christian > Lindig' > <christian.lindig@xxxxxxxxxx>; 'David Scott' <dave@xxxxxxxxxx>; 'Volodymyr > Babchuk' > <Volodymyr_Babchuk@xxxxxxxx>; 'Roger Pau Monné' <roger.pau@xxxxxxxxxx>; > xen-devel@xxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, > XEN_DOMCTL_CDF_evtchn_fifo, > ... > > On 03.12.2020 16:45, Paul Durrant wrote: > >> From: Jan Beulich <jbeulich@xxxxxxxx> > >> Sent: 03 December 2020 15:23 > >> > >> On 03.12.2020 13:41, Paul Durrant wrote: > >>> From: Paul Durrant <pdurrant@xxxxxxxxxx> > >>> > >>> ...to control the visibility of the FIFO event channel operations > >>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) > >>> to > >>> the guest. > >>> > >>> These operations were added to the public header in commit d2d50c2f308f > >>> ("evtchn: add FIFO-based event channel ABI") and the first implementation > >>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement > >>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61 > >>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to > >>> that, a guest issuing those operations would receive a return value of > >>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations > >>> but > >>> running on an older (pre-4.4) Xen would fall back to using the 2-level > >>> event > >>> channel interface upon seeing this return value. > >>> > >>> Unfortunately the uncontrolable appearance of these new operations in Xen > >>> 4.4 > >>> onwards has implications for hibernation of some Linux guests. During > >>> resume > >>> from hibernation, there are two kernels involved: the "boot" kernel and > >>> the > >>> "resume" kernel. The guest boot kernel may default to use FIFO operations > >>> and > >>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On > >>> the > >>> other hand, the resume kernel keeps assuming 2-level, because it was > >>> hibernated > >>> on a version of Xen that did not support the FIFO operations. > >>> > >>> To maintain compatibility it is necessary to make Xen behave as it did > >>> before the new operations were added and hence the code in this patch > >>> ensures > >>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel > >>> operations will again result in -ENOSYS being returned to the guest. > >> > >> I have to admit I'm now even more concerned of the control for such > >> going into Xen, the more with the now 2nd use in the subsequent patch. > >> The implication of this would seem to be that whenever we add new > >> hypercalls or sub-ops, a domain creation control would also need > >> adding determining whether that new sub-op is actually okay to use by > >> a guest. Or else I'd be keen to up front see criteria at least roughly > >> outlined by which it could be established whether such an override > >> control is needed. > >> > > > > Ultimately I think any new hypercall (or related set of hypercalls) added > > to the ABI needs to be > opt-in on a per-domain basis, so that we know that from when a domain is > first created it will not see > a change in its environment unless the VM administrator wants that to happen. > > A new hypercall appearing is a change to the guest's environment, yes, > but a backwards compatible one. I don't see how this would harm a guest. Say we have a guest which is aware of the newer Xen and is coded to use the new hypercall *but* we start it on an older Xen where the new hypercall is not implemented. *Then* we migrate it to the newer Xen... the new hypercall suddenly becomes available and changes the guest behaviour. In the worst case, the guest is sufficiently confused that it crashes. > This and ... > > >> I'm also not convinced such controls really want to be opt-in rather > >> than opt-out. > > > > They really need to be opt-in I think. From a cloud provider PoV it is > > important that nothing in a > customer's environment changes unless we want it to. Otherwise we have no way > to deploy an updated > hypervisor version without risking crashing their VMs. > > ... this sound to me more like workarounds for buggy guests than > functionality the hypervisor _needs_ to have. (I can appreciate > the specific case here for the specific scenario you provide as > an exception.) If we want to have a hypervisor that can be used in a cloud environment then Xen absolutely needs this capability. > > >>> --- a/xen/arch/arm/domain.c > >>> +++ b/xen/arch/arm/domain.c > >>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct > >>> xen_domctl_createdomain *config) > >>> unsigned int max_vcpus; > >>> > >>> /* HVM and HAP must be set. IOMMU may or may not be */ > >>> - if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) != > >>> + if ( (config->flags & > >>> + ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) != > >>> (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) ) > >>> { > >>> dprintk(XENLOG_INFO, "Unsupported configuration %#x\n", > >>> --- a/xen/arch/arm/domain_build.c > >>> +++ b/xen/arch/arm/domain_build.c > >>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void) > >>> struct domain *d; > >>> struct xen_domctl_createdomain d_cfg = { > >>> .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE, > >>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > >>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > >>> + XEN_DOMCTL_CDF_evtchn_fifo, > >>> .max_evtchn_port = -1, > >>> .max_grant_frames = 64, > >>> .max_maptrack_frames = 1024, > >>> --- a/xen/arch/arm/setup.c > >>> +++ b/xen/arch/arm/setup.c > >>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset, > >>> struct bootmodule *xen_bootmodule; > >>> struct domain *dom0; > >>> struct xen_domctl_createdomain dom0_cfg = { > >>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > >>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > >>> + XEN_DOMCTL_CDF_evtchn_fifo, > >>> .max_evtchn_port = -1, > >>> .max_grant_frames = gnttab_dom0_frames(), > >>> .max_maptrack_frames = -1, > >>> --- a/xen/arch/x86/setup.c > >>> +++ b/xen/arch/x86/setup.c > >>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const > >>> module_t *image, > >>> const char *loader) > >>> { > >>> struct xen_domctl_createdomain dom0_cfg = { > >>> - .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity > >>> : 0, > >>> + .flags = XEN_DOMCTL_CDF_evtchn_fifo | > >>> + (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity > >>> : 0), > >>> .max_evtchn_port = -1, > >>> .max_grant_frames = -1, > >>> .max_maptrack_frames = -1, > >>> --- a/xen/common/domain.c > >>> +++ b/xen/common/domain.c > >>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct > >>> xen_domctl_createdomain *config) > >>> ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > >>> XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off | > >>> XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu | > >>> - XEN_DOMCTL_CDF_nested_virt) ) > >>> + XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) ) > >>> { > >>> dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags); > >>> return -EINVAL; > >> > >> All of the hunks above point out a scalability issue if we were to > >> follow this route for even just a fair part of new sub-ops, and I > >> suppose you've noticed this with the next patch presumably touching > >> all the same places again. > > > > Indeed. This solution works for now but is probably not what we want in the > > long run. Same goes for > the current way we control viridian features via an HVM param. It is good > enough for now IMO since > domctl is not a stable interface. Any ideas about how we might implement a > better interface in the > longer term? > > While it has other downsides, Jürgen's proposal doesn't have any > similar scalability issue afaics. Another possible model would > seem to be to key new hypercalls to hypervisor CPUID leaf bits, > and derive their availability from a guest's CPUID policy. Of > course this won't work when needing to retrofit guarding like > you want to do here. > Ok, I'll take a look hypfs as an immediate solution, if that's preferred. Paul

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.