[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] vpmu=1 and running 'perf top' within a PVHVM guest eventually hangs dom0 and hypervisor has stuck vCPUS. Romley-EP (model=45, stepping=2)



----- maillists.shan@xxxxxxxxx wrote:

> We also met the issue as fixed by Dietmar's workaround. I remember we
> two had some email discussion at that time.
> 
> The issue causing interrupt loop is:
> It seems that on NHM (at that time) when a PMI arrives at CPU, the
> counter has a value to zero (instead of some other small value, say 3
> or 5, seen on Core 2 Duo). In this case, unmasking the PMI via APIC
> will trigger immediately another PMI.
> This does not produce problem with native kernel, since it typically
> programs the counter with another value (as needed by making yet
> another sampling point) before unmasking.
> For Xen, PMI handler cannot handle the counter immediately since it
> should be handled by guests. It just records a virtual PMI to guests
> and unmasks the PMI before return.
> 
> We don't know whether this is a desired HW behavior. But we hope we
> can get confirm from internal HW team quickly.


I will note that this workaround appeared not to be needed on Haswell. I 
have run my tests there for fairly long period of time without any problems.

Of course, this doesn't *prove* that the workaround is not needed but
I'd usually trigger this hang withing 20-30 minutes at the most on other
processors. On Haswell I ran for 6 or 7 hours.

-boris



> 
> Shan Haitao
> 
> 2013/3/13 Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>:
> > Am Dienstag 12 MÃrz 2013, 16:54:11 schrieb Boris Ostrovsky:
> >> On 03/12/2013 04:31 PM, Konrad Rzeszutek Wilk wrote:
> >> > On Tue, Mar 12, 2013 at 02:50:59PM -0400, Boris Ostrovsky wrote:
> >> >> On 03/12/2013 01:30 PM, Konrad Rzeszutek Wilk wrote:
> >> >>> This issue I am encountering seems to only happen on
> multi-socket
> >> >>> machines.
> >> >> I believe I was able to reproduce this (once) on my laptop.
> >> >>
> >> >>> It also does not help that the only multi-socket box I have is
> >> >>> an Romley-EP (so two socket SandyBridge CPUs). The other
> >> >>> SandyBridge boxes I've (one socket) are not showing this.
> Granted
> >> >>> they are also a different model (42).
> >> >>>
> >> >>> The problem is that when I run 'perf top' within an SMP PVHVM
> >> >>> guest, after a couple of seconds or minutes the guest hangs.
> >> >>> Hypervisor ends up stuck too looping, and then the dom0 ends
> >> >>> up hanging as well.
> >> >>>
> >> >>> Dumping the cpu registers (Ctrl-A x3, then 'd'
> >> >>> shows that the guest is pretty firmly stuck in
> vmx_vmexit_handler:
> >> >>>
> >> >>> (XEN)    [<ffff82c4c01d386f>] vmx_vmexit_handler+0x22f/0x174
> >> >> And in my case this address is the second instruction after STI,
> i.e. we
> >> >> are right at the point where interrupts got enabled.
> >> >>
> >> >> So I am wondering whether this has something to do with the
> counter
> >> >> overflow interrupt (which I believe is an NMI).
> >> > Interestingly enough, if I run the PVHVM guest with 'nowatchdog'
> >> > it runs fine!
> >>
> >> I think by default perf top runs off timer interrupt so it does not
> use
> >> HW counters. But watchdog
> >> is implemented on top of the counters so perhaps it fires the
> interrupt
> >> at a bad time, messing
> >> something up.
> >
> > This looks like a strange behavior we had on nehalem cpus see
> > http://lists.xen.org/archives/html/xen-devel/2010-11/msg01157.html
> > For this I added a quirk, see check_pmc_quirk() in vpmu_core2.c
> > The model 42 is in the quirk list and it seems to work but Romley-EP
> is model
> > 43 I think which is not in the list.
> > Maybe you should add this model and give it a try.
> >
> >
> > Dietmar.
> >
> > --
> > Company details: http://ts.fujitsu.com/imprint.html
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxx
> > http://lists.xen.org/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.