[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Problems with merlot* AMD Opteron 6376 systems (Was Re: stable trees (was: [xen-4.2-testing test] 58584: regressions))



On Wed, 2015-06-24 at 10:38 +0100, Ian Campbell wrote:
> Adding Boris+Suravee+Aravind (AMD/SVM maintainers), Dario (NUMA) and Jim
> +Anthony (libvirt) to the CC.

> Supposing that the NUMA oddities might be what is exposing this issue I
> tried an adhoc run on the merlot machines where I specified
> "dom0_max_vcpus=8 dom0_nodes=0" on the hypervisor command line:
> http://logs.test-lab.xenproject.org/osstest/logs/58853/
> 
> Again, I messed up the config for the -xsm case, so ignore.
> 
> The interesting thing is that the extra NUMA settings were
> apparently_not_ helpful. From
> http://logs.test-lab.xenproject.org/osstest/logs/58853/test-amd64-amd64-libvirt/serial-merlot0.log
>  I can see they were applied:
> Jun 23 15:50:34.205057 (XEN) Command line: placeholder conswitch=x watchdog 
> com1=115200,8n1 console=com1,vga gdb=com1 dom0_mem=512M,max:512M ucode=scan 
> dom0_max_vcpus=8 dom0_nodes=0
> [...]
> Jun 23 15:50:38.309057 (XEN) Dom0 has maximum 8 VCPUs
> 
IIRC, you can drop the dom0_max_vcpus=8, as Xen would figure it out
automatically, as a consequence of dom0_nodes=0. In any case, it doesn't
hurt.

Maybe we can try running this again with dom0_nodes=2 (the other node
with memory attached). I wouldn't know what to expect, though, so, yes,
it's a shot in the dark, but since we're out of plausible
theories... :-/

> The memory info
> Jun 23 15:56:27.749008 (XEN) Memory location of each domain:
> Jun 23 15:56:27.756965 (XEN) Domain 0 (total: 131072):
> Jun 23 15:56:27.756983 (XEN)     Node 0: 126905
> Jun 23 15:56:27.756998 (XEN)     Node 1: 0
> Jun 23 15:56:27.764952 (XEN)     Node 2: 4167
> Jun 23 15:56:27.764969 (XEN)     Node 3: 0
> suggests at least a small amount of cross-node memory allocation (16M
> out of dom0s 512M total). That's probably small enough to be OK.
> 
Yeah, that is in line with what you usually get with dom0_nodes. Most of
the memory, as you noted, comes from the proper node. We're just not
(yet?) at the point where _all_ of it can come from there.

> And it seems as if the 8 dom0 vcpus are correctly pinned to the first 8
> cpus (the ones in node 0):
> Jun 23 15:56:43.797055 (XEN) VCPU information and callbacks for domain 0:
> Jun 23 15:56:43.797110 (XEN)     VCPU0: CPU4 [has=F] poll=0 upcall_pend=00 
> upcall_mask=00 dirty_cpus={4}
> Jun 23 15:56:43.805078 (XEN)     cpu_hard_affinity={0-7} 
> cpu_soft_affinity={0-7}
> Jun 23 15:56:43.813121 (XEN)     pause_count=0 pause_flags=1
> Jun 23 15:56:43.813157 (XEN)     No periodic timer
> Jun 23 15:56:43.821050 (XEN)     VCPU1: CPU3 [has=F] poll=0 upcall_pend=00 
> upcall_mask=00 dirty_cpus={3}
> Jun 23 15:56:43.829044 (XEN)     cpu_hard_affinity={0-7} 
> cpu_soft_affinity={0-7}
> Jun 23 15:56:43.829082 (XEN)     pause_count=0 pause_flags=1
> Jun 23 15:56:43.837051 (XEN)     No periodic timer
> Jun 23 15:56:43.837084 (XEN)     VCPU2: CPU5 [has=F] poll=0 upcall_pend=00 
> upcall_mask=00 dirty_cpus={5}
> Jun 23 15:56:43.845102 (XEN)     cpu_hard_affinity={0-7} 
> cpu_soft_affinity={0-7}
> Jun 23 15:56:43.853035 (XEN)     pause_count=0 pause_flags=1
> Jun 23 15:56:43.853071 (XEN)     No periodic timer
> Jun 23 15:56:43.853099 (XEN)     VCPU3: CPU7 [has=F] poll=0 upcall_pend=00 
> upcall_mask=00 dirty_cpus={7}
> Jun 23 15:56:43.861102 (XEN)     cpu_hard_affinity={0-7} 
> cpu_soft_affinity={0-7}
> Jun 23 15:56:43.869110 (XEN)     pause_count=0 pause_flags=1
> Jun 23 15:56:43.869145 (XEN)     No periodic timer
> Jun 23 15:56:43.877014 (XEN)     VCPU4: CPU0 [has=F] poll=0 upcall_pend=00 
> upcall_mask=00 dirty_cpus={}
> Jun 23 15:56:43.877038 (XEN)     cpu_hard_affinity={0-7} 
> cpu_soft_affinity={0-7}
> Jun 23 15:56:43.885053 (XEN)     pause_count=0 pause_flags=1
> Jun 23 15:56:43.885088 (XEN)     No periodic timer
> Jun 23 15:56:43.893085 (XEN)     VCPU5: CPU0 [has=F] poll=0 upcall_pend=00 
> upcall_mask=00 dirty_cpus={}
> Jun 23 15:56:43.901075 (XEN)     cpu_hard_affinity={0-7} 
> cpu_soft_affinity={0-7}
> Jun 23 15:56:43.901134 (XEN)     pause_count=0 pause_flags=1
> Jun 23 15:56:43.909010 (XEN)     No periodic timer
> Jun 23 15:56:43.909048 (XEN)     VCPU6: CPU2 [has=F] poll=0 upcall_pend=00 
> upcall_mask=00 dirty_cpus={2}
> Jun 23 15:56:43.917065 (XEN)     cpu_hard_affinity={0-7} 
> cpu_soft_affinity={0-7}
> Jun 23 15:56:43.925055 (XEN)     pause_count=0 pause_flags=1
> Jun 23 15:56:43.925074 (XEN)     No periodic timer
> Jun 23 15:56:43.925095 (XEN)     VCPU7: CPU6 [has=F] poll=0 upcall_pend=00 
> upcall_mask=00 dirty_cpus={6}
> Jun 23 15:56:43.933119 (XEN)     cpu_hard_affinity={0-7} 
> cpu_soft_affinity={0-7}
> Jun 23 15:56:43.941080 (XEN)     pause_count=0 pause_flags=1
> Jun 23 15:56:43.941129 (XEN)     No periodic timer
> 
> So whatever the issue is it doesn't seem to be particularly related to
> the strange NUMA layout.
> 
Exactly.

Inspecting the logs and looking at the dump of scheduler info, pCPUs
info and vCPUs info, everything seems completely and fully idle, at the
time the debugkeys are sent to the box.

There are no vCPUs active or waiting in any runqueue, all the host pCPUs
are in idle_loop() and all Dom0 vCPUs are in ffffffff810013aa, which
should be xen_hypercall_sched_op... So, if there was something keeping
the system busy enough to make QEMU miss the 10 secs timeout, any dead
or live lock, either in Xen or Dom0, it seems to be gone by when we
realize things have gone bad and go inspecting the system (as a further,
although of course not conclusive, proof of that, we do manage to see
the output of `xl info', `xl list', etc., performed during
ts-capture-logs, so the system is indeed able to respond).

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.