[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 04/20] xen/domctl: Drop vcpu_alloc_lock



>>> On 21.03.18 at 18:57, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 21/03/18 05:46, Juergen Gross wrote:
>> On 20/03/18 18:22, Andrew Cooper wrote:
>>> On 20/03/18 16:58, Jan Beulich wrote:
>>>>>>> On 19.03.18 at 20:13, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>> It is not entirely clear why this interlock was introduced in c/s 
>>>>> 8cbb5278e
>>>>> "x86/AMD: Add support for AMD's OSVW feature in guests".
>>>>>
>>>>> At the time, svm_handle_osvw() could have seen an unexpected change in 
>>>>> OSVW
>>>>> (not the case now, due to the new CPUID Policy infrastructure), but even 
> then,
>>>>> it would have caused spurious changes in behaviour when handling
>>>>> OSVW_{ID_LENGTH,STATUS} read requests on behalf of an already-running 
>>>>> guest.
>>>>>
>>>>> There are plenty of other aspects of domain creation which depend on 
> hardware
>>>>> details which may change across a microcode load, but where not protected 
>>>>> by
>>>>> this interlock.
>>>> Are there? We don't re-read CPUID (yet), for example. But of
>>>> course it is also not really specified which aspects may change
>>>> across microcode updates.
>>>>
>>>>> A host administrator choosing to perform late microcode loading has 
>>>>> plenty 
> of
>>>>> other problems to worry about, and is it not unreasonable to expect them 
>>>>> to
>>>>> temporarily cease domain construction activities while the microcode 
>>>>> loading
>>>>> is in progress.
>>>> But it is also not unreasonable to expect the hypervisor to guard
>>>> against inconsistencies here. On the whole I'm not really
>>>> convinced; I think I'd like to hear others' opinions.
>>> The underlying problem is that this lock cannot say when merging
>>> max_cpus into createdomain, because we cannot continue the hypercall
>>> midway through.
>>>
>>> As it doesn't currently protect createdomain, which amongst other things
>>> contains init_domain_cpuid_policy() and init_domain_msr_policy() (the
>>> most likely structures to be affected by microcode updates), I don't see
>>> any purpose in keeping it for the minute area it does cover.
>> What about failing domain creation e.g. via -EAGAIN in case a
>> microcode update happened in between? This would be easy by adding a
>> microcode generation count which would have to be the same for start
>> and end of the create domain hypercall.
> 
> Failing the hypercall is very complicated once domain_create() has
> completed successfully.  See patch 11
> 
> (Although in writing this reply, I see that patch 11 is buggy).
> 
> Amongst other things, failing that late burns a domid, because there is
> a period during which the domain has to live in the domain list.
> 
> I don't see the point of trying to retain a broken mechanism, especially
> at the expense of error path complexity.

Well, I can certainly understand your dislike of the current situation.
Nevertheless, the common route is to improve things, not make them
worse (even if perhaps just slightly). That's not to say that I intend
to outright NAK the patch, but I continue to be of two minds.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.