[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/7] xen: vNUMA support for PV guests



On Tue, Nov 19, 2013 at 11:43 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>> On 19.11.13 at 17:36, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:
>> On mar, 2013-11-19 at 15:54 +0000, Jan Beulich wrote:
>>> >>> On 19.11.13 at 16:42, Dario Faggioli <dario.faggioli@xxxxxxxxxx> wrote:
>>> > So, what would the best option be? Another hypercall (or a special way
>>> > of calling this one) "just" to retrieve the number of vnodes?
>>>
>>> Iirc there's a padding field in the interface structure, which could
>>> be leveraged. But then again you need two counts, and hence it
>>> might be better to simply add two respective fields. Then make
>>> it/them IN/OUT, and rather than filling the arrays when they're
>>> too small just send back the necessary values. (And of course
>>> you'll want to also send back the actual values in case the passed
>>> in ones turned out to large, so the guest would know how many
>>> of the array elements actually have valid data).
>>>
>>> But in the end the fundamental question stands - how was a PV
>>> guest in your so far proposed model supposed to know its number
>>> of vNodes? While for HVM guests you can make this available via
>>> ACPI, that's not an option for PV.
>>>
>> Wait... I'm no longer so sure I'm getting what you say. I'd be inclined
>> to say "by the XENMEM_get_vnuma_info hcall implemented here", but then
>> again, maybe I'm missing something.
>>
>> The hypercall does provide a mean for the guest to retrieve _all_ the
>> virtual topology information, such as:
>>  - number of virtual nodes
>>  - virtual node memory ranges
>>  - virtual cpu to virtual node mapping
>>  - virtual node to physical node mapping, for use in (future) in-guest
>>    vNUMA aware subsystems (e.g., ballooning)
>>
>> So, if your point is (as I thought) that for properly allocating the
>> buffers for this hypercall to work we need an information only provided
>> by this hypercall itself, then I agree, and that's why I asked what
>> alternative way would be best to retrieve that bit of information.
>>
>> If it's something else, then I don't know. :-)
>
> No, it is what you're naming above. I was merely curious how you
> had supposed the guest would know the vNode count prior to this
> change request of mine. I didn't look at the Linux patches yet (due
> to lack of time), hence I don't know how you derived(?) the node
> count without it coming back from the hypercall here.
>
> Jan
>

Hello Jan and Dario.


I have looked at what Jan asked and wanted to see if that can be resolved.

Jan is right, if the guest is running with Linux configured with
maxcpus less than vcpus in VM config,
there is a problem.

On this boot stage where xen_numa_init is called xen_smp_prepare_cpus
equals to vcpus in config;
It only will be reduced to maxcpus (from kernel boot args) after
xen_numa_init during xen_smp_prepare.

In xen_numa_init I have all values I need to make a decision in
regards to initialize vnuma or not, or modify.

These are the numbers I have in xen_numa_init:
num_possible_cpus() = hypervisor provided guest vcpus;
setup_max_cpus = boot kernel param maxcpus;

When setup_max_cpus > num_possible_cpus, num_possible_cpus will be brought up;

I can detect that setup_max_cpus < num_possible_cpus and do not init
vnuma at all, and just do a fake node.
I can also make sure that hypervisor is aware of it (by calling same
subop with NULL, lets suppose).

Then hypervisor have to make some decision regarding vnuma topology
for this domain. Thus this will be as before, when
guest is not aware of underlying NUMA. It will have to fix
vcpu_to_vnode mask, and possibly adjust pinned vcpus to cpus.
The memory, if allocated on different nodes, will remain as it is.

Does it sound like sensible solution? Or maybe you have some other ideas?

-- 
Elena

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.