[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo, ...



Hi,

I haven't looked at the series yet. Just adding some thoughts on why one would want such option.

On 04/12/2020 09:43, Jan Beulich wrote:
On 04.12.2020 09:22, Paul Durrant wrote:
From: Jan Beulich <jbeulich@xxxxxxxx>
Sent: 04 December 2020 07:53

On 03.12.2020 18:07, Paul Durrant wrote:
From: Jan Beulich <jbeulich@xxxxxxxx>
Sent: 03 December 2020 15:57

... this sound to me more like workarounds for buggy guests than
functionality the hypervisor _needs_ to have. (I can appreciate
the specific case here for the specific scenario you provide as
an exception.)

If we want to have a hypervisor that can be used in a cloud environment
then Xen absolutely needs this capability.

As per above you can conclude that I'm still struggling to see the
"why" part here.


Imagine you are a customer. You boot your OS and everything is just fine... you 
run your workload and all is good. You then shut down your VM and re-start it. 
Now it starts to crash. Who are you going to blame? You did nothing to your OS 
or application s/w, so you are going to blame the cloud provider of course.

That's a situation OSes are in all the time. Buggy applications may
stop working on newer OS versions. It's still the application that's
in need of updating then. I guess OSes may choose to work around
some very common applications' bugs, but I'd then wonder on what
basis "very common" gets established. I dislike the underlying
asymmetry / inconsistency (if not unfairness) of such a model,
despite seeing that there may be business reasons leading people to
think they want something like this.

The discussion seems to be geared towards buggy guest so far. However, this is not the only reason that one my want to avoid exposing some features:

1) From the recent security issues (such as XSA-343), a knob to disable FIFO would be quite beneficials for vendors that don't need the feature.

2) Fleet management purpose. You may have a fleet with multiple versions of Xen. You don't want your customer to start relying on features that may not be available on all the hosts otherwise it complicates the guest placement.

FAOD, I am sure there might be other features that need to be disabled. But we have to start somewhere :).


Now imagine you are the cloud provider, running Xen. What you did was start to 
upgrade your hosts from an older version of Xen to a newer version of Xen, to 
pick up various bug fixes and make sure you are running a version that is 
within the security support envelope. You identify that your customer's problem 
is a bug in their OS that was latent on the old version of the hypervisor but 
is now manifesting on the new one because it has buggy support for a hypercall 
that was added between the two versions. How are you going to fix this issue, 
and get your customer up and running again? Of course you'd like your customer 
to upgrade their OS, but they can't even boot it to do that. You really need a 
solution that can restore the old VM environment, at least temporarily, for 
that customer.

Boot the guest on a not-yet-upgraded host again, to update its kernel?

You are making the assumption that the customer would have the choice to target a specific versions of Xen. This may be undesirable for a cloud provider as suddenly your customer may want to stick on the old version of Xen.

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.