[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XSA-351 causing Solaris-11 systems to panic during boot.



On 18.12.2020 21:43, boris.ostrovsky@xxxxxxxxxx wrote:
> On 12/17/20 12:49 PM, boris.ostrovsky@xxxxxxxxxx wrote:
>> On 12/17/20 11:46 AM, Andrew Cooper wrote:
>>> On 17/12/2020 16:25, boris.ostrovsky@xxxxxxxxxx wrote:
>>>> On 12/17/20 2:40 AM, Jan Beulich wrote:
>>>>> On 17.12.2020 02:51, boris.ostrovsky@xxxxxxxxxx wrote:
>>>>> I think this is acceptable as a workaround, albeit we may want to
>>>>> consider further restricting this (at least on staging), like e.g.
>>>>> requiring a guest config setting to enable the workaround. 
>>>> Maybe, but then someone migrating from a stable release to 4.15 will have 
>>>> to modify guest configuration.
>>>>
>>>>
>>>>> But
>>>>> maybe this will need to be part of the MSR policy for the domain
>>>>> instead, down the road. We'll definitely want Andrew's view here.
>>>>>
>>>>> Speaking of staging - before applying anything to the stable
>>>>> branches, I think we want to have this addressed on the main
>>>>> branch. I can't see how Solaris would work there.
>>>> Indeed it won't. I'll need to do that as well (I misinterpreted the 
>>>> statement in the XSA about only 4.14- being vulnerable)
>>> It's hopefully obvious now why we suddenly finished the "lets turn all
>>> unknown MSRs to #GP" work at the point that we did (after dithering on
>>> the point for several years).
>>>
>>> To put it bluntly, default MSR readability was not a clever decision at all.
>>>
>>> There is a large risk that there is a similar vulnerability elsewhere,
>>> given how poorly documented the MSRs are (and one contemporary CPU I've
>>> got the manual open for has more than 6000 *documented* MSRs).  We did
>>> debate for a while whether the readability of the PPIN MSRs was a
>>> vulnerability or not, before eventually deciding not.
> 
> 
> Can we do something like KVM's ignore_msrs (but probably return 0 on reads to 
> avoid leaks from the system)? It would allow to deal with cases when a guest 
> is suddenly unable to boot after hypervisor update (especially from 
> pre-4.14). It won't help in all cases since some MSRs may be expected to be 
> non-zero but I think it will cover large number of them. (and it will 
> certainly do what Jan is asking above but will not be specific to this 
> particular breakage)

This would re-introduce the problem with detection (by guests) of certain
features lacking suitable CPUID bits. Guests would no longer observe the
expected #GP(0), and hence be at risk of misbehaving. Hence at the very
least such an option would need to be per-domain rather than (like for
KVM) global, and use of it should then imo be explicitly unsupported. And
along the lines of what KVM has, this may want to be a tristate so the
ignoring can be both silent and verbose.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.