[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition



On 11.01.2021 10:09, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@xxxxxxxx>
>> Sent: 11 January 2021 09:00
>> To: paul@xxxxxxx
>> Cc: wl@xxxxxxx; iwj@xxxxxxxxxxxxxx; anthony.perard@xxxxxxxxxx; 
>> andrew.cooper3@xxxxxxxxxx;
>> george.dunlap@xxxxxxxxxx; julien@xxxxxxx; sstabellini@xxxxxxxxxx; 
>> roger.pau@xxxxxxxxxx; xen-
>> devel@xxxxxxxxxxxxxxxxxxxx; 'Igor Druzhinin' <igor.druzhinin@xxxxxxxxxx>
>> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per 
>> partition
>>
>> On 11.01.2021 09:45, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>>>> Sent: 09 January 2021 00:56
>>>> To: paul@xxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx
>>>> Cc: wl@xxxxxxx; iwj@xxxxxxxxxxxxxx; anthony.perard@xxxxxxxxxx; 
>>>> andrew.cooper3@xxxxxxxxxx;
>>>> george.dunlap@xxxxxxxxxx; jbeulich@xxxxxxxx; julien@xxxxxxx; 
>>>> sstabellini@xxxxxxxxxx;
>>>> roger.pau@xxxxxxxxxx
>>>> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per 
>>>> partition
>>>>
>>>> On 08/01/2021 08:32, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>>>>>> Sent: 08 January 2021 00:47
>>>>>> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
>>>>>> Cc: paul@xxxxxxx; wl@xxxxxxx; iwj@xxxxxxxxxxxxxx; 
>>>>>> anthony.perard@xxxxxxxxxx;
>>>>>> andrew.cooper3@xxxxxxxxxx; george.dunlap@xxxxxxxxxx; jbeulich@xxxxxxxx; 
>>>>>> julien@xxxxxxx;
>>>>>> sstabellini@xxxxxxxxxx; roger.pau@xxxxxxxxxx; Igor Druzhinin 
>>>>>> <igor.druzhinin@xxxxxxxxxx>
>>>>>> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per 
>>>>>> partition
>>>>>>
>>>>>> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
>>>>>> the maximum number of virtual processors per partition" that "can be 
>>>>>> obtained
>>>>>> through CPUID leaf 0x40000005". Furthermore, "Requirements for 
>>>>>> Implementing
>>>>>> the Microsoft Hypervisor Interface" defines that starting from Windows 
>>>>>> Server
>>>>>> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
>>>>>> contain a value -1 basically assuming the hypervisor has no restriction 
>>>>>> while
>>>>>> 0 (that we currently expose) means the default restriction is still 
>>>>>> present.
>>>>>>
>>>>>> Along with previous changes exposing ExProcessorMasks this allows a 
>>>>>> recent
>>>>>> Windows VM with Viridian extension enabled to have more than 64 vCPUs 
>>>>>> without
>>>>>> going into immediate BSOD.
>>>>>>
>>>>>
>>>>> This is very odd as I was happily testing with a 128 vCPU VM once 
>>>>> ExProcessorMasks was
>>>> implemented... no need for the extra leaf.
>>>>> The documentation for 0x40000005 states " Describes the scale limits 
>>>>> supported in the current
>>>> hypervisor implementation. If any
>>>>> value is zero, the hypervisor does not expose the corresponding 
>>>>> information". It does not say (in
>>>> section 7.8.1 or elsewhere AFAICT)
>>>>> what implications that has for VP_INDEX.
>>>>>
>>>>> In " Requirements for Implementing the Microsoft Hypervisor Interface" I 
>>>>> don't see anything saying
>>>> what the semantics of not
>>>>> implementing leaf 0x40000005 are, only that if implementing it then -1 
>>>>> must be used to break the
>> 64
>>>> VP limit. It also says that
>>>>> reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be 
>>>>> clear, which is clearly
>>>> not true if ExProcessorMasks is
>>>>> enabled. That document is dated June 13th 2012 so I think it should be 
>>>>> taken with a pinch of salt.
>>>>>
>>>>> Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is 
>>>>> enabled? If so, which
>>>> version of Windows? I'd like to get
>>>>> a repro myself.
>>>>
>>>> I return with more testing that confirm both my and your results.
>>>>
>>>> 1) with ExProcessorMask and 66 vCPUs assigned both current WS19 build and
>>>> pre-release 20270 of vNext boot correctly with no changes
>>>>
>>>> and that would be fine but the existing documentation for 
>>>> ex_processor_masks
>>>> states that:
>>>> "Hence this enlightenment must be specified for guests with more
>>>> than 64 vCPUs if B<hcall_remote_tlb_flush> and/or B<hcall_ipi> are also
>>>> specified."
>>>>
>>>> You then would expect 64+ vCPU VM to boot without ex_processors_mask,
>>>> hcall_remote_tlb_flush and hcall_ipi.
>>>
>>> Indeed.
>>>
>>>>
>>>> So,
>>>> 2) without ExProcessorMaks and 66 vCPUs assigned and with 
>>>> hcall_remote_tlb_flush
>>>> and hcall_ipi disabled: WS19 BSODs and vNext fails to initialize secondary 
>>>> CPUs
>>>>
>>>
>>> This is not what I recall from testing but I can confirm I see the same now.
>>>
>>>> After applying my change,
>>>> 3) without ExProcessorMaks and 66 vCPUs assigned and with 
>>>> hcall_remote_tlb_flush
>>>> and hcall_ipi disabled WS19 and vNext boot correctly
>>>>
>>>> So we either need to change documentation and require ExProcessorMasks for 
>>>> all
>>>> VMs with 64+ vCPUs or go with my change.
>>>>
>>>
>>> I think we go with your change. I guess we can conclude then that the 
>>> separate viridian flag as part
>> of the base set is the right way to go (to stop the leaf magically appearing 
>> on migrate of existing
>> guests) and leave ex_processor_masks (and the documentation) as-is.
>>>
>>> You can add my R-b to the patch.
>>
>> That's the unchanged patch then, including the libxl change that
>> I had asked about and that I have to admit I don't fully follow
>> Igor's responses? I'm hesitant to give an ack for that aspect of
>> the change, yet I suppose the libxl maintainers will defer to
>> x86 ones there. Alternatively Andrew or Roger could of course
>> ack this ...
>>
> 
> I don't think we really need specific control in xl.cfg as this is a fix for 
> some poorly documented semantics in the spec. The flag simply prevents the 
> leaf magically appearing on migrate and I think that's enough. 

Well, okay then. I can throw in this patch unchanged; it is my
understanding that there'll be a v2 for patch 2.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.