[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo, ...



On 04.12.2020 12:45, Julien Grall wrote:
> Hi,
> 
> I haven't looked at the series yet. Just adding some thoughts on why one 
> would want such option.
> 
> On 04/12/2020 09:43, Jan Beulich wrote:
>> On 04.12.2020 09:22, Paul Durrant wrote:
>>>> From: Jan Beulich <jbeulich@xxxxxxxx>
>>>> Sent: 04 December 2020 07:53
>>>>
>>>> On 03.12.2020 18:07, Paul Durrant wrote:
>>>>>> From: Jan Beulich <jbeulich@xxxxxxxx>
>>>>>> Sent: 03 December 2020 15:57
>>>>>>
>>>>>> ... this sound to me more like workarounds for buggy guests than
>>>>>> functionality the hypervisor _needs_ to have. (I can appreciate
>>>>>> the specific case here for the specific scenario you provide as
>>>>>> an exception.)
>>>>>
>>>>> If we want to have a hypervisor that can be used in a cloud environment
>>>>> then Xen absolutely needs this capability.
>>>>
>>>> As per above you can conclude that I'm still struggling to see the
>>>> "why" part here.
>>>>
>>>
>>> Imagine you are a customer. You boot your OS and everything is just fine... 
>>> you run your workload and all is good. You then shut down your VM and 
>>> re-start it. Now it starts to crash. Who are you going to blame? You did 
>>> nothing to your OS or application s/w, so you are going to blame the cloud 
>>> provider of course.
>>
>> That's a situation OSes are in all the time. Buggy applications may
>> stop working on newer OS versions. It's still the application that's
>> in need of updating then. I guess OSes may choose to work around
>> some very common applications' bugs, but I'd then wonder on what
>> basis "very common" gets established. I dislike the underlying
>> asymmetry / inconsistency (if not unfairness) of such a model,
>> despite seeing that there may be business reasons leading people to
>> think they want something like this.
> 
> The discussion seems to be geared towards buggy guest so far. However, 
> this is not the only reason that one my want to avoid exposing some 
> features:
> 
>     1) From the recent security issues (such as XSA-343), a knob to 
> disable FIFO would be quite beneficials for vendors that don't need the 
> feature.

Except that this wouldn't have been suitable as a during-embargo
mitigation, for its observability by guests.

>     2) Fleet management purpose. You may have a fleet with multiple 
> versions of Xen. You don't want your customer to start relying on 
> features that may not be available on all the hosts otherwise it 
> complicates the guest placement.

Guests incapable to run on older Xen are a problem in this regard
anyway, aren't they? And if they are capable, I don't see what
you're referring to.

> FAOD, I am sure there might be other features that need to be disabled. 
> But we have to start somewhere :).

If there is such a need, then yes, sure. But shouldn't we at least
gain rough agreement on how the future is going to look like with
this? IOW have in hands some at least roughly agreed criteria by
which we could decide which new ABI additions will need some form
of override going forward (also allowing to judge which prior
additions may want to gain overrides in a retroactive fashion, or
in fact should have had ones from the beginning)?

>>> Now imagine you are the cloud provider, running Xen. What you did was start 
>>> to upgrade your hosts from an older version of Xen to a newer version of 
>>> Xen, to pick up various bug fixes and make sure you are running a version 
>>> that is within the security support envelope. You identify that your 
>>> customer's problem is a bug in their OS that was latent on the old version 
>>> of the hypervisor but is now manifesting on the new one because it has 
>>> buggy support for a hypercall that was added between the two versions. How 
>>> are you going to fix this issue, and get your customer up and running 
>>> again? Of course you'd like your customer to upgrade their OS, but they 
>>> can't even boot it to do that. You really need a solution that can restore 
>>> the old VM environment, at least temporarily, for that customer.
>>
>> Boot the guest on a not-yet-upgraded host again, to update its kernel?
> 
> You are making the assumption that the customer would have the choice to 
> target a specific versions of Xen. This may be undesirable for a cloud 
> provider as suddenly your customer may want to stick on the old version 
> of Xen.

I've gone from you saying "You really need a solution that can restore
the old VM environment, at least temporarily, for that customer." The
"temporarily" to me implies that it is at least an option to tie a
certain guest to a certain Xen version for in-guest upgrading purposes.
If the deal with the customer doesn't include running on a certain Xen
version, I don't see how this could have non-temporary consequences to
the cloud provider.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.