[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition



On 08/01/2021 08:32, Paul Durrant wrote:
>> -----Original Message-----
>> From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>> Sent: 08 January 2021 00:47
>> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
>> Cc: paul@xxxxxxx; wl@xxxxxxx; iwj@xxxxxxxxxxxxxx; anthony.perard@xxxxxxxxxx;
>> andrew.cooper3@xxxxxxxxxx; george.dunlap@xxxxxxxxxx; jbeulich@xxxxxxxx; 
>> julien@xxxxxxx;
>> sstabellini@xxxxxxxxxx; roger.pau@xxxxxxxxxx; Igor Druzhinin 
>> <igor.druzhinin@xxxxxxxxxx>
>> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per partition
>>
>> TLFS 7.8.1 stipulates that "a virtual processor index must be less than
>> the maximum number of virtual processors per partition" that "can be obtained
>> through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
>> the Microsoft Hypervisor Interface" defines that starting from Windows Server
>> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now
>> contain a value -1 basically assuming the hypervisor has no restriction while
>> 0 (that we currently expose) means the default restriction is still present.
>>
>> Along with previous changes exposing ExProcessorMasks this allows a recent
>> Windows VM with Viridian extension enabled to have more than 64 vCPUs without
>> going into immediate BSOD.
>>
> 
> This is very odd as I was happily testing with a 128 vCPU VM once 
> ExProcessorMasks was implemented... no need for the extra leaf.
> The documentation for 0x40000005 states " Describes the scale limits 
> supported in the current hypervisor implementation. If any
> value is zero, the hypervisor does not expose the corresponding information". 
> It does not say (in section 7.8.1 or elsewhere AFAICT)
> what implications that has for VP_INDEX.
> 
> In " Requirements for Implementing the Microsoft Hypervisor Interface" I 
> don't see anything saying what the semantics of not
> implementing leaf 0x40000005 are, only that if implementing it then -1 must 
> be used to break the 64 VP limit. It also says that
> reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be 
> clear, which is clearly not true if ExProcessorMasks is
> enabled. That document is dated June 13th 2012 so I think it should be taken 
> with a pinch of salt.
> 
> Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is 
> enabled? If so, which version of Windows? I'd like to get
> a repro myself.

I return with more testing that confirm both my and your results.

1) with ExProcessorMask and 66 vCPUs assigned both current WS19 build and
pre-release 20270 of vNext boot correctly with no changes

and that would be fine but the existing documentation for ex_processor_masks
states that:
"Hence this enlightenment must be specified for guests with more
than 64 vCPUs if B<hcall_remote_tlb_flush> and/or B<hcall_ipi> are also
specified."

You then would expect 64+ vCPU VM to boot without ex_processors_mask,
hcall_remote_tlb_flush and hcall_ipi.

So,
2) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
and hcall_ipi disabled: WS19 BSODs and vNext fails to initialize secondary CPUs

After applying my change,
3) without ExProcessorMaks and 66 vCPUs assigned and with hcall_remote_tlb_flush
and hcall_ipi disabled WS19 and vNext boot correctly

So we either need to change documentation and require ExProcessorMasks for all
VMs with 64+ vCPUs or go with my change.

Igor



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.