[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo, ...



On 03.12.20 18:07, Paul Durrant wrote:
-----Original Message-----
From: Jan Beulich <jbeulich@xxxxxxxx>
Sent: 03 December 2020 15:57
To: paul@xxxxxxx
Cc: 'Paul Durrant' <pdurrant@xxxxxxxxxx>; 'Eslam Elnikety' 
<elnikety@xxxxxxxxxx>; 'Ian Jackson'
<iwj@xxxxxxxxxxxxxx>; 'Wei Liu' <wl@xxxxxxx>; 'Anthony PERARD' 
<anthony.perard@xxxxxxxxxx>; 'Andrew
Cooper' <andrew.cooper3@xxxxxxxxxx>; 'George Dunlap' 
<george.dunlap@xxxxxxxxxx>; 'Julien Grall'
<julien@xxxxxxx>; 'Stefano Stabellini' <sstabellini@xxxxxxxxxx>; 'Christian 
Lindig'
<christian.lindig@xxxxxxxxxx>; 'David Scott' <dave@xxxxxxxxxx>; 'Volodymyr 
Babchuk'
<Volodymyr_Babchuk@xxxxxxxx>; 'Roger Pau Monné' <roger.pau@xxxxxxxxxx>; 
xen-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, 
XEN_DOMCTL_CDF_evtchn_fifo,
...

On 03.12.2020 16:45, Paul Durrant wrote:
From: Jan Beulich <jbeulich@xxxxxxxx>
Sent: 03 December 2020 15:23

On 03.12.2020 13:41, Paul Durrant wrote:
From: Paul Durrant <pdurrant@xxxxxxxxxx>

...to control the visibility of the FIFO event channel operations
(EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
the guest.

These operations were added to the public header in commit d2d50c2f308f
("evtchn: add FIFO-based event channel ABI") and the first implementation
appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
that, a guest issuing those operations would receive a return value of
-ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
running on an older (pre-4.4) Xen would fall back to using the 2-level event
channel interface upon seeing this return value.

Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
onwards has implications for hibernation of some Linux guests. During resume
from hibernation, there are two kernels involved: the "boot" kernel and the
"resume" kernel. The guest boot kernel may default to use FIFO operations and
instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
other hand, the resume kernel keeps assuming 2-level, because it was hibernated
on a version of Xen that did not support the FIFO operations.

To maintain compatibility it is necessary to make Xen behave as it did
before the new operations were added and hence the code in this patch ensures
that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel
operations will again result in -ENOSYS being returned to the guest.

I have to admit I'm now even more concerned of the control for such
going into Xen, the more with the now 2nd use in the subsequent patch.
The implication of this would seem to be that whenever we add new
hypercalls or sub-ops, a domain creation control would also need
adding determining whether that new sub-op is actually okay to use by
a guest. Or else I'd be keen to up front see criteria at least roughly
outlined by which it could be established whether such an override
control is needed.


Ultimately I think any new hypercall (or related set of hypercalls) added to 
the ABI needs to be
opt-in on a per-domain basis, so that we know that from when a domain is first 
created it will not see
a change in its environment unless the VM administrator wants that to happen.

A new hypercall appearing is a change to the guest's environment, yes,
but a backwards compatible one. I don't see how this would harm a guest.

Say we have a guest which is aware of the newer Xen and is coded to use the new 
hypercall *but* we start it on an older Xen where the new hypercall is not 
implemented. *Then* we migrate it to the newer Xen... the new hypercall 
suddenly becomes available and changes the guest behaviour. In the worst case, 
the guest is sufficiently confused that it crashes.

This and ...

I'm also not convinced such controls really want to be opt-in rather
than opt-out.

They really need to be opt-in I think. From a cloud provider PoV it is 
important that nothing in a
customer's environment changes unless we want it to. Otherwise we have no way 
to deploy an updated
hypervisor version without risking crashing their VMs.

... this sound to me more like workarounds for buggy guests than
functionality the hypervisor _needs_ to have. (I can appreciate
the specific case here for the specific scenario you provide as
an exception.)

If we want to have a hypervisor that can be used in a cloud environment then 
Xen absolutely needs this capability.


--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct 
xen_domctl_createdomain *config)
      unsigned int max_vcpus;

      /* HVM and HAP must be set. IOMMU may or may not be */
-    if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) !=
+    if ( (config->flags &
+          ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) !=
           (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) )
      {
          dprintk(XENLOG_INFO, "Unsupported configuration %#x\n",
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2478,7 +2478,8 @@ void __init create_domUs(void)
          struct domain *d;
          struct xen_domctl_createdomain d_cfg = {
              .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE,
-            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
+            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
+                     XEN_DOMCTL_CDF_evtchn_fifo,
              .max_evtchn_port = -1,
              .max_grant_frames = 64,
              .max_maptrack_frames = 1024,
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset,
      struct bootmodule *xen_bootmodule;
      struct domain *dom0;
      struct xen_domctl_createdomain dom0_cfg = {
-        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
+        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
+                 XEN_DOMCTL_CDF_evtchn_fifo,
          .max_evtchn_port = -1,
          .max_grant_frames = gnttab_dom0_frames(),
          .max_maptrack_frames = -1,
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t 
*image,
                                           const char *loader)
  {
      struct xen_domctl_createdomain dom0_cfg = {
-        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
+        .flags = XEN_DOMCTL_CDF_evtchn_fifo |
+                 (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0),
          .max_evtchn_port = -1,
          .max_grant_frames = -1,
          .max_maptrack_frames = -1,
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -307,7 +307,7 @@ static int sanitise_domain_config(struct 
xen_domctl_createdomain *config)
           ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
             XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
             XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
-           XEN_DOMCTL_CDF_nested_virt) )
+           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) )
      {
          dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
          return -EINVAL;

All of the hunks above point out a scalability issue if we were to
follow this route for even just a fair part of new sub-ops, and I
suppose you've noticed this with the next patch presumably touching
all the same places again.

Indeed. This solution works for now but is probably not what we want in the 
long run. Same goes for
the current way we control viridian features via an HVM param. It is good 
enough for now IMO since
domctl is not a stable interface. Any ideas about how we might implement a 
better interface in the
longer term?

While it has other downsides, Jürgen's proposal doesn't have any
similar scalability issue afaics. Another possible model would
seem to be to key new hypercalls to hypervisor CPUID leaf bits,
and derive their availability from a guest's CPUID policy. Of
course this won't work when needing to retrofit guarding like
you want to do here.


Ok, I'll take a look hypfs as an immediate solution, if that's preferred.

Paul, if you'd like me to add a few patches to my series doing the basic
framework (per-domain nodes, interfaces for setting struct domain fields
via hypfs), I can do that. You could then concentrate on the tools side.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.