[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regression, host crash with 4.5rc1



(Please forgive my lack of Xen-fu knowledge in advance)

If this issue were to happen on Linux/bare-metal, this is how I'd debug it.
Hopefully some of this will translate to Xen in one way or another.

dmesg | grep idle
will tell us what idle driver is running (on Dom0 kernel)
and if it is intel_idle, it will also tell us the supported sub-states 
(CPUID.MWAIT.EDX value)

grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
will tell us what states the OS is requesting,
It will expand on the "FFH" bit here:

> > (XEN)     C1:   type[C1] latency[003] usage[12219860] method[  FFH]
> > duration[1190961948551]
> > (XEN)     C2:   type[C1] latency[010] usage[10205554] method[  FFH]
> > duration[2015393965907]
> > (XEN)     C3:   type[C2] latency[020] usage[50926286] method[  FFH]
> > duration[30527997858148]

I'm hopeful that this information comes from the hardware's BIOS
and not some hypervisor tricking out Dom0 with a fake BIOS, yes?

If Xen doesn't have cpuidle, or its sysfs, then acpidump for the platform
should be able to tell us what the platform is exporting.

Next, hopefully the attached turbostat utility can be invoked on Dom0
and it can read the MSRs on at least 1 processor via the /dev/cpu interface.

This will tell you what the hardware supports, and what HW states are actually
being invoked.  (which  may be different from what the OS asks for...)

It may tell us just the same thing I think we learned here:

> > (XEN) PC2[0] PC3[8589642315848] PC6[0] PC7[0]
> > (XEN) CC3[28794734145697] CC6[0] CC7[0]

which I'm assuming are a dump of the MSR residency counters.
If yes, it appears to be that this platform is not invoking c6 and pc6 at all,
and that the deepest state being used is actually cc3 and pc3.
I don't know if that is because you've booted the kernel with max_cstate=N
of some kind, or if this is default.

attached is turbostat, source and binary, run it this way
and send the ts.out file:

# ./turbostat --debug sleep 5 > ts.out 2>&1

Guessing...
If no surprises in the debug stuff requested above, and
If the XEN debug stuff above is with c6 explicitly disabled...
Note that here are two kinds of c6 -- CC6 (core) and PC6 (package).
If this box supports both, the next thing to try will be to keep CC6
enabled, but to just disable PC6.  This is done via an MSR that turbostat
dumps out (MSR_NHM_SNB_PKG_CST_CFG_CTL) via the wrmsr(8) utility.
Though if that MSR is locked by the BIOS, then BIOS SETUP option
may be the only way to disable the package C-state limit without
also disabling the associated core C-state.

cheers,
-Len


ps. 

Attachment: turbostat-test.tar.gz
Description: turbostat-test.tar.gz

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.