[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo, ...



> -----Original Message-----
> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: 03 December 2020 15:57
> To: paul@xxxxxxx
> Cc: 'Paul Durrant' <pdurrant@xxxxxxxxxx>; 'Eslam Elnikety' 
> <elnikety@xxxxxxxxxx>; 'Ian Jackson'
> <iwj@xxxxxxxxxxxxxx>; 'Wei Liu' <wl@xxxxxxx>; 'Anthony PERARD' 
> <anthony.perard@xxxxxxxxxx>; 'Andrew
> Cooper' <andrew.cooper3@xxxxxxxxxx>; 'George Dunlap' 
> <george.dunlap@xxxxxxxxxx>; 'Julien Grall'
> <julien@xxxxxxx>; 'Stefano Stabellini' <sstabellini@xxxxxxxxxx>; 'Christian 
> Lindig'
> <christian.lindig@xxxxxxxxxx>; 'David Scott' <dave@xxxxxxxxxx>; 'Volodymyr 
> Babchuk'
> <Volodymyr_Babchuk@xxxxxxxx>; 'Roger Pau Monné' <roger.pau@xxxxxxxxxx>; 
> xen-devel@xxxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, 
> XEN_DOMCTL_CDF_evtchn_fifo,
> ...
> 
> On 03.12.2020 16:45, Paul Durrant wrote:
> >> From: Jan Beulich <jbeulich@xxxxxxxx>
> >> Sent: 03 December 2020 15:23
> >>
> >> On 03.12.2020 13:41, Paul Durrant wrote:
> >>> From: Paul Durrant <pdurrant@xxxxxxxxxx>
> >>>
> >>> ...to control the visibility of the FIFO event channel operations
> >>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) 
> >>> to
> >>> the guest.
> >>>
> >>> These operations were added to the public header in commit d2d50c2f308f
> >>> ("evtchn: add FIFO-based event channel ABI") and the first implementation
> >>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
> >>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
> >>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
> >>> that, a guest issuing those operations would receive a return value of
> >>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations 
> >>> but
> >>> running on an older (pre-4.4) Xen would fall back to using the 2-level 
> >>> event
> >>> channel interface upon seeing this return value.
> >>>
> >>> Unfortunately the uncontrolable appearance of these new operations in Xen 
> >>> 4.4
> >>> onwards has implications for hibernation of some Linux guests. During 
> >>> resume
> >>> from hibernation, there are two kernels involved: the "boot" kernel and 
> >>> the
> >>> "resume" kernel. The guest boot kernel may default to use FIFO operations 
> >>> and
> >>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On 
> >>> the
> >>> other hand, the resume kernel keeps assuming 2-level, because it was 
> >>> hibernated
> >>> on a version of Xen that did not support the FIFO operations.
> >>>
> >>> To maintain compatibility it is necessary to make Xen behave as it did
> >>> before the new operations were added and hence the code in this patch 
> >>> ensures
> >>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel
> >>> operations will again result in -ENOSYS being returned to the guest.
> >>
> >> I have to admit I'm now even more concerned of the control for such
> >> going into Xen, the more with the now 2nd use in the subsequent patch.
> >> The implication of this would seem to be that whenever we add new
> >> hypercalls or sub-ops, a domain creation control would also need
> >> adding determining whether that new sub-op is actually okay to use by
> >> a guest. Or else I'd be keen to up front see criteria at least roughly
> >> outlined by which it could be established whether such an override
> >> control is needed.
> >>
> >
> > Ultimately I think any new hypercall (or related set of hypercalls) added 
> > to the ABI needs to be
> opt-in on a per-domain basis, so that we know that from when a domain is 
> first created it will not see
> a change in its environment unless the VM administrator wants that to happen.
> 
> A new hypercall appearing is a change to the guest's environment, yes,
> but a backwards compatible one. I don't see how this would harm a guest.

Say we have a guest which is aware of the newer Xen and is coded to use the new 
hypercall *but* we start it on an older Xen where the new hypercall is not 
implemented. *Then* we migrate it to the newer Xen... the new hypercall 
suddenly becomes available and changes the guest behaviour. In the worst case, 
the guest is sufficiently confused that it crashes.

> This and ...
> 
> >> I'm also not convinced such controls really want to be opt-in rather
> >> than opt-out.
> >
> > They really need to be opt-in I think. From a cloud provider PoV it is 
> > important that nothing in a
> customer's environment changes unless we want it to. Otherwise we have no way 
> to deploy an updated
> hypervisor version without risking crashing their VMs.
> 
> ... this sound to me more like workarounds for buggy guests than
> functionality the hypervisor _needs_ to have. (I can appreciate
> the specific case here for the specific scenario you provide as
> an exception.)

If we want to have a hypervisor that can be used in a cloud environment then 
Xen absolutely needs this capability.

> 
> >>> --- a/xen/arch/arm/domain.c
> >>> +++ b/xen/arch/arm/domain.c
> >>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct 
> >>> xen_domctl_createdomain *config)
> >>>      unsigned int max_vcpus;
> >>>
> >>>      /* HVM and HAP must be set. IOMMU may or may not be */
> >>> -    if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) !=
> >>> +    if ( (config->flags &
> >>> +          ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) !=
> >>>           (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) )
> >>>      {
> >>>          dprintk(XENLOG_INFO, "Unsupported configuration %#x\n",
> >>> --- a/xen/arch/arm/domain_build.c
> >>> +++ b/xen/arch/arm/domain_build.c
> >>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void)
> >>>          struct domain *d;
> >>>          struct xen_domctl_createdomain d_cfg = {
> >>>              .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE,
> >>> -            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> >>> +            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> >>> +                     XEN_DOMCTL_CDF_evtchn_fifo,
> >>>              .max_evtchn_port = -1,
> >>>              .max_grant_frames = 64,
> >>>              .max_maptrack_frames = 1024,
> >>> --- a/xen/arch/arm/setup.c
> >>> +++ b/xen/arch/arm/setup.c
> >>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset,
> >>>      struct bootmodule *xen_bootmodule;
> >>>      struct domain *dom0;
> >>>      struct xen_domctl_createdomain dom0_cfg = {
> >>> -        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> >>> +        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> >>> +                 XEN_DOMCTL_CDF_evtchn_fifo,
> >>>          .max_evtchn_port = -1,
> >>>          .max_grant_frames = gnttab_dom0_frames(),
> >>>          .max_maptrack_frames = -1,
> >>> --- a/xen/arch/x86/setup.c
> >>> +++ b/xen/arch/x86/setup.c
> >>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const 
> >>> module_t *image,
> >>>                                           const char *loader)
> >>>  {
> >>>      struct xen_domctl_createdomain dom0_cfg = {
> >>> -        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity 
> >>> : 0,
> >>> +        .flags = XEN_DOMCTL_CDF_evtchn_fifo |
> >>> +                 (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity 
> >>> : 0),
> >>>          .max_evtchn_port = -1,
> >>>          .max_grant_frames = -1,
> >>>          .max_maptrack_frames = -1,
> >>> --- a/xen/common/domain.c
> >>> +++ b/xen/common/domain.c
> >>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct 
> >>> xen_domctl_createdomain *config)
> >>>           ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> >>>             XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
> >>>             XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
> >>> -           XEN_DOMCTL_CDF_nested_virt) )
> >>> +           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) )
> >>>      {
> >>>          dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
> >>>          return -EINVAL;
> >>
> >> All of the hunks above point out a scalability issue if we were to
> >> follow this route for even just a fair part of new sub-ops, and I
> >> suppose you've noticed this with the next patch presumably touching
> >> all the same places again.
> >
> > Indeed. This solution works for now but is probably not what we want in the 
> > long run. Same goes for
> the current way we control viridian features via an HVM param. It is good 
> enough for now IMO since
> domctl is not a stable interface. Any ideas about how we might implement a 
> better interface in the
> longer term?
> 
> While it has other downsides, Jürgen's proposal doesn't have any
> similar scalability issue afaics. Another possible model would
> seem to be to key new hypercalls to hypervisor CPUID leaf bits,
> and derive their availability from a guest's CPUID policy. Of
> course this won't work when needing to retrofit guarding like
> you want to do here.
> 

Ok, I'll take a look hypfs as an immediate solution, if that's preferred.

  Paul




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.