[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest



On 07/20/2015 10:09 AM, Dario Faggioli wrote:
On Fri, 2015-07-17 at 14:17 -0400, Boris Ostrovsky wrote:
On 07/17/2015 03:27 AM, Dario Faggioli wrote:
In the meanwhile, what should we do? Document this? How? "don't use
vNUMA with PV guest in SMT enabled systems" seems a bit harsh... Is
there a workaround we can put in place/suggest?
I haven't been able to reproduce this on my Intel box because I think I
have different core enumeration.

Yes, most likely, that's highly topology dependant. :-(

Can you try adding
    cpuid=['0x1:ebx=xxxxxxxx00000001xxxxxxxxxxxxxxxx']
to your config file?

Done (sorry for the delay, the testbox was busy doing other stuff).

Still no joy (.101 is the IP address of the guest, domain id 3):

root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# xl vcpu-list 3
Name                                ID  VCPU   CPU State   Time(s) Affinity 
(Hard / Soft)
test                                 3     0    4   r--      23.6  all / 0-7
test                                 3     1    9   r--      19.8  all / 0-7
test                                 3     2    8   -b-       0.4  all / 8-15
test                                 3     3    4   -b-       0.2  all / 8-15

*HOWEVER* it seems to have an effect. In fact, now, topology as it is
shown in /sys/... is different:

root@test:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0
(it was 0-1)

This, OTOH, is still the same:
root@test:~# cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list
0-3

Also, I now see this:

[    0.150560] ------------[ cut here ]------------
[    0.150560] WARNING: CPU: 2 PID: 0 at ../arch/x86/kernel/smpboot.c:317 
topology_sane.isra.2+0x74/0x88()
[    0.150560] sched: CPU #2's llc-sibling CPU #0 is not on the same node! 
[node: 1 != 0]. Ignoring dependency.
[    0.150560] Modules linked in:
[    0.150560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0+ #1
[    0.150560]  0000000000000009 ffff88001ee2fdd0 ffffffff81657c7b 
ffffffff810bbd2c
[    0.150560]  ffff88001ee2fe20 ffff88001ee2fe10 ffffffff81081510 
ffff88001ee2fea0
[    0.150560]  ffffffff8103aa02 ffff88003ea0a001 0000000000000000 
ffff88001f20a040
[    0.150560] Call Trace:
[    0.150560]  [<ffffffff81657c7b>] dump_stack+0x4f/0x7b
[    0.150560]  [<ffffffff810bbd2c>] ? up+0x39/0x3e
[    0.150560]  [<ffffffff81081510>] warn_slowpath_common+0xa1/0xbb
[    0.150560]  [<ffffffff8103aa02>] ? topology_sane.isra.2+0x74/0x88
[    0.150560]  [<ffffffff81081570>] warn_slowpath_fmt+0x46/0x48
[    0.150560]  [<ffffffff8101eeb1>] ? __cpuid.constprop.0+0x15/0x19
[    0.150560]  [<ffffffff8103aa02>] topology_sane.isra.2+0x74/0x88
[    0.150560]  [<ffffffff8103acd0>] set_cpu_sibling_map+0x27a/0x444
[    0.150560]  [<ffffffff81056ac3>] ? numa_add_cpu+0x98/0x9f
[    0.150560]  [<ffffffff8100b8f2>] cpu_bringup+0x63/0xa8
[    0.150560]  [<ffffffff8100b945>] cpu_bringup_and_idle+0xe/0x1a
[    0.150560] ---[ end trace 63d204896cce9f68 ]---

Notice that it now says 'llc-sibling', while, before, it was saying
'smt-sibling'.

Exactly. You are now passing the first topology test which was to see that threads are on the same node. And since each processor has only one thread (as evidenced by thread_siblings_list) we are good.

The second test checks that cores (i.e. things that share last level cache) are on the same node. And they are not.



On AMD, BTW, we fail a different test so some other bits probably need
to be tweaked. You may fail it too (the LLC sanity check).

Yep, that's the one I guess. Should I try something more/else?


I'll need to see how LLC IDs are calculated, probably also from some CPUID bits. The question though will be --- what do we do with how cache sizes (and TLB sizes for that matter) are presented to the guests. Do we scale them down per thread?

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.