[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2)



Am Mittwoch 13 MÃrz 2013, 08:51:30 schrieb Dietmar Hahn:
> Am Dienstag 12 MÃrz 2013, 16:54:11 schrieb Boris Ostrovsky:
> > On 03/12/2013 04:31 PM, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Mar 12, 2013 at 02:50:59PM -0400, Boris Ostrovsky wrote:
> > >> On 03/12/2013 01:30 PM, Konrad Rzeszutek Wilk wrote:
> > >>> This issue I am encountering seems to only happen on multi-socket
> > >>> machines.
> > >> I believe I was able to reproduce this (once) on my laptop.
> > >>
> > >>> It also does not help that the only multi-socket box I have is
> > >>> an Romley-EP (so two socket SandyBridge CPUs). The other
> > >>> SandyBridge boxes I've (one socket) are not showing this. Granted
> > >>> they are also a different model (42).
> > >>>
> > >>> The problem is that when I run 'perf top' within an SMP PVHVM
> > >>> guest, after a couple of seconds or minutes the guest hangs.
> > >>> Hypervisor ends up stuck too looping, and then the dom0 ends
> > >>> up hanging as well.
> > >>>
> > >>> Dumping the cpu registers (Ctrl-A x3, then 'd'
> > >>> shows that the guest is pretty firmly stuck in vmx_vmexit_handler:
> > >>>
> > >>> (XEN)    [<ffff82c4c01d386f>] vmx_vmexit_handler+0x22f/0x174
> > >> And in my case this address is the second instruction after STI, i.e. we
> > >> are right at the point where interrupts got enabled.
> > >>
> > >> So I am wondering whether this has something to do with the counter
> > >> overflow interrupt (which I believe is an NMI).
> > > Interestingly enough, if I run the PVHVM guest with 'nowatchdog'
> > > it runs fine!
> > 
> > I think by default perf top runs off timer interrupt so it does not use 
> > HW counters. But watchdog
> > is implemented on top of the counters so perhaps it fires the interrupt 
> > at a bad time, messing
> > something up.
> 
> This looks like a strange behavior we had on nehalem cpus see
> http://lists.xen.org/archives/html/xen-devel/2010-11/msg01157.html
> For this I added a quirk, see check_pmc_quirk() in vpmu_core2.c
> The model 42 is in the quirk list and it seems to work but Romley-EP is model
> 43 I think which is not in the list.

Sorry It should be 45?
But this isn't on the list too, currently only 47, 46, 42 and 26 - the
processors we were able to test.

Dietmar.

> Maybe you should add this model and give it a try.




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.