[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] RE: Live migration fails due to c/s 20627



Dan Magenheimer wrote:
>>> .  And, as I've said before,
>>> the node/cpu info provided by Linux in TSC_AUX is
>>> wrong anyway (except in very constrained environments
>>> such as where the admin has pinned vcpus to pcpus).
>> 
>> I don't agree with you at this point. For guest numa support,
>> it should be a must to pin virtual node's vcpus to its
>> related physical node and crossing-node vcpu migration should
>> be disallowed by default, otherwise guest numa support is
>> meaningless, right ?
> 
> It's not a must.  A system administrator should always
> have the option of choosing flexibility vs performance.
> I agree that when performance is higher priority, pinning
> is a must, but pinning may even have issues when the
> guest's nvcpus exceeds the number of cores in a node. 

Could you elaborate the issues you can see ?  Normally, virtual node's number 
of vcpus should be less than one physical node's cpu number. But enen if vcpu's 
number is more than physical cpu's number in a node, why it can introduce 
issues ? 

> So I am saying there are many cases where TSC_AUX,
> when set by a guest OS, will be incorrect.  

Could you figure out the incorrect cases ?  

>And yes I
> agree there are cases (with pinning) where it will
> be correct.  But how does an app or OS know whether to
> trust TSC_AUX or not?  

If hypervisor exposes this instruction to guest, it should be trusted and safe 
to use, because hypervisor should be responsible for fully virtualizing this 
instruction and let guest think it is running one a native machine. 

>Better to have some other
> method to get pcpu/pnode information that is known
> to be always correct (via some method of querying the hypervisor
> directly

I don't think guest should know host's numa info through anyway. Basically, 
guest only needs to be aware guest's numa info. For example, host numa info 
maybe 2 nodes and each node is configured with 16G mem and 16 LPs , guest's 
virtual numa info maybe 2 nodes and each node has 2G mem and 2 vcpus. In this 
case, guest only needs to get the virtual numa info instead of host's numa info 
when it enables numa support.  And at the same time, hypervisor is reponsible 
for how to allocate 2G memory from 16G mem from the physical node, and how to 
schudle virtual node's vcpus to physical cpus(according to performance vs 
flexibility as you said).  

>> If vcpu's migration only happens in its physical node, I
>> can't see why you thought the info provided in the MSR is
>> wrong.   Actually, each vcpu should have a virtual
>> TSC_AUX_MSR(guest should set it before using it), and this
>> virtual MSR is saved/restored from/to physical TSC_AUX_MSR
>> between context switch, so in vmx non-root mode the value in
>> physical TSC_AUX_MSR should follow guest's setting rather
>> than host's setting , and it also reflect guest's info
>> related to virtual node/virtual cpu, and it still should be
>> the expected value for guest's applications.  In addition, we
>> have to know host's TSC_AUX_MSR and guest's TSC_AUX_MSR are
>> totally two different things except that they are saved in
>> one physical register in cpu's different execution phases,
>> shouldn't  mix them together.
> 
> My argument is simply that if TSC_AUX cannot ALWAYS
> be trusted by an application, apps will NEVER trust it.
> And if apps NEVER trust it, why expose it at all?

This instruction is safe to use and has been fully virtualized in vmx non-root 
mode via Dongxiao's patch, why not trust it ? I can't figure out one reason. :-)
Xiantao


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.