[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Need help in debugging partially blocked hypervisor


  • To: xen-devel@xxxxxxxxxxxxxxxxxxx
  • From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
  • Date: Tue, 3 Nov 2009 08:52:53 +0100
  • Cc: "Shan, Haitao" <haitao.shan@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
  • Delivery-date: Mon, 02 Nov 2009 23:53:27 -0800
  • Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Subject:Date:User-Agent:Cc: References:In-Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Message-Id; b=VrrwdoVegL+HIX1SgtjOpZNyz45LLgqVWMFYAnnVZ5lw0KY3coMpFX99 88RPr5vi2jeKKau0orglrqJxQGI1yXSy4u5u5PjD0LzFe7+n9IEDcuejR V7hjbqvlZwUH+IsftEfu/1irHfhpz/gYHbJzzpQBFfKilRrXRosk9lFuk cZ6KDsE+Ksr3wg5ZUmSkMDoOFuLowT9coGYnP3JGgGbyhYaN7SVSmYHAt 4Z9xb6CH6dBtD4ZDvigfhK6CtA40Q;
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Please see below.

> See my comments embedded. :)
> 
> Haitao
> 
> 
> Dietmar Hahn wrote:
> > The conclusion is, that this seems to be a workaround for the endless
> > NMI loop. PMI's are a very rarely event and this should not raise a
> > performance 
> > problem.
> I totally agree that this is only a workaround for approach 1.
> 
> > 
> > I didn't try your second approach
> >> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical
> >> PMI* when guest vcpu unmasks virtual PMI. but I have some question. 
> > 
> > - What if the 'physical PMI' is not unmasked in vpmu_do_interrupt and
> >   a watchdog NMI would occur before the domU unmasks it?
> I think the second NMI will be lost.
> 
> > - Is it possible that after handling the NMI (and not unmasking)
> >   another domU got running on this CPU and therefore PMI's got lost?
> LVTPC entry in physical local APIC is save/restored by Xen on VCPU switches. 
> So unmasking (or not) of PMI of one vcpu should have no impact on another 
> vcpu. When developing vPMU, I treated as vPMU context both PMU MSRs and LVTPC 
> entry in local APIC. vPMU context is save/restored on physical HW when vcpus 
> is scheduled, either in an active save/restore manner or a lazy one 
> (depending on the PMU usage at the time of switch).
> 
> > 
> > But the real cause of the problem is unknown. As said I saw this only
> > on 
> > Nehalem. Maybe there is a problem together with the hardware? Perhaps
> > your 
> > hardware colleagues know something more ;-)
> When I found this problem, I just thought it might be a corner case that only 
> happens on my box (of course, I only see this in NHM, too). 
> I will try to pin HW guy to see if any explanation, since it is proven to be 
> a general problem on NHM.
> 
> But before everything is clear, I think approach 2 is a better solution now.

What would be the effect if the guest unmasks the PMI (which leads to unmasking 
the 'physical PMI')
but doesn't reset the counter to a value != 0? Is the guest able to produce the 
nmi endless loop?

Dietmar.

> 
> > 
> > Thanks
> > Dietmar
> > 
> >> 
> >>> 
> >>> When I met this problem, I remember that I tried two approaches:
> >>> 1> Setting the counter to non-zero before unmasking PMI in
> >>> vpmu_do_interrupt; 2> Remove unmasking PMI from vpmu_do_interrupt
> >>> and unmask *physical PMI* when guest vcpu unmasks virtual PMI. 
> >>> I remember that approach 2 can fix this issue. But I do not
> >>> remember the result of approach 1, since I met this about one year
> >>> ago.  
> >>> It is my understanding that approach 2 is quite same as approach 1,
> >>> since normally guest will set the counter to some negative value
> >>> (for example, -100000) before unmasking virtual PMI.  
> >>> However, approach 2 looks cleaner and more reasonable.
> >>> 
> >>> Can you have a try and let me know the result? If both can not
> >>> work, there might be some problems that I have not met before. 
> >>> 
> >>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. So,
> >>> there is no need for me to work on that now. :) 
> >>> 
> >>> Haitao
> >>> 
> >>> 
> >>> Dietmar Hahn wrote:
> >>>> Hi Haitao,
> >>>> 
> >>>>> Can I know how you enabled vPMU on Nehalem? This is not supported
> >>>>> in current Xen.
> >>>> 
> >>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> >>>> 
> >>>>> 
> >>>>> Concerning vpmu support, I totally agree that we can disable this
> >>>>> feature by default. If anyone really wants to use it, he can use
> >>>>> boot options to turn it on.
> >>>> 
> >>>> Yes, that's OK for me.
> >>>> 
> >>>>> I am preparing a patch for that. And I will
> >>>>> send a patch to enable NHM vpmu together.
> >>>>> 
> >>>>> For the problem that Dietmar met, I think I once met this before.
> >>>>> Can you add some code in vpmu_do_interrupt that sets the counter
> >>>>> you are using to a value other than zero? Please let me know if
> >>>>> that can help.
> >>>> 
> >>>> I don't set the counter to zero. I use 0-val to set the counter.
> >>>> Actually I testet on Nehalem with
> >>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and
> >>>> val=1100000 
> >>>> - Fixed counter #1 (0x30a) and val=1100000
> >>>> The thing is that in normal case the overflows of both counters
> >>>> appear nearly at the same time. As described I added some extra
> >>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code looks
> >>>> like: 
> >>>> 
> >>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 1.
> >>>>          Step    { uint32_t HAHN_l, HAHN_h;
> >>>>          HAHN_l = (uint32_t) msr_content;
> >>>>          HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>          HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. Step    
> >>>> }
> >>>>     if ( !msr_content )
> >>>>         return 0;
> >>>>     core2_vpmu_cxt->global_ovf_status |= msr_content;
> >>>>     msr_content = 0xC000000700000000 | ((1 <<
> >>>>     core2_get_pmc_count()) - 1);
> >>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   -> 3. Step 
> >>>> 
> >>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 4.
> >>>>         Step     { uint32_t HAHN_l, HAHN_h;
> >>>>         HAHN_l = (uint32_t) msr_content;
> >>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    -> 5.
> >>>> Step 
> >>>> 
> >>>>         rdmsrl(0xc3, msr_content);                        -> 6.
> >>>>         Step General counter #2 HAHN_l = (uint32_t) msr_content;
> >>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
> >>>>         rdmsrl(0x30a, msr_content);                       -> 7.
> >>>>         Step Fixed counter #1 HAHN_l = (uint32_t) msr_content;
> >>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);         }
> >>>> 
> >>>> With these tracers I got the following output:
> >>>> 
> >>>> Last good NMI:
> >>>> Both counter cause the NMI. Resetting works OK.
> >>>> The counter itself were running further.
> >>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ] 
> >>>> rdmsrl(0xc3) -> #2 general counter 
> >>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ] 
> >>>> rdmsrl(0x30a) -> #1 fixed counter 
> >>>> 
> >>>> NMI from where things goes wrong:
> >>>> Both counter cause the NMI. Resetting works NOT correct, only for
> >>>> the general counter! The general counter (caused the NMI) seems to
> >>>> be stopped! 
> >>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ] 
> >>>> rdmsrl(0xc3) -> #2 general counter 
> >>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ] 
> >>>> rdmsrl(0x30a) -> #1 fixed counter 
> >>>> 
> >>>> Wrong NMI:
> >>>> Only the fixed counter causes the NMI (which was not resetted
> >>>> during NMI handling above!) Both counter seems to be stopped!
> >>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ] 
> >>>> rdmsrl(0xc3) -> #2 general counter 
> >>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ] 
> >>>> rdmsrl(0x30a) -> #1 fixed counter 
> >>>> 
> >>>> And this state remains forever!
> >>>> I hope my explanations are understandable ;-)
> >>>> 
> >>>> Until now I can see this behavior only on a Nehalem processor.
> >>>> 
> >>>> Thanks.
> >>>> Dietmar
> >>>> 
> >>>>> 
> >>>>> Best Regards
> >>>>> Shan Haitao
> >>>>> 
> >>>>> 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
> >>>>>> On 30/10/2009 12:20, "Dietmar Hahn"
> >>>>>> <dietmar.hahn@xxxxxxxxxxxxxx> wrote: 
> >>>>>> 
> >>>>>>> I searched the intel processor spec but couldn't find any help.
> >>>>>>> So my questions is, what is wrong here?
> >>>>>>> Can anybody with more knowledge point me in the right direction,
> >>>>>>> what can I still do to find the real cause of this?
> >>>>>> 
> >>>>>> You should probably Cc one of the Intel guys who implemented this
> >>>>>> stuff -- I've added Haitao Shan.
> >>>>>> 
> >>>>>> Meanwhile I'd be interested to know whether things work okay for
> >>>>>> you, minus performance counters and the hypervisor hang, if you
> >>>>>> return immediately from vpmu_initialise(). Really at minimum we
> >>>>>> need such a fix, perhaps with a boot paremeter to re-enable the
> >>>>>> feature, for 3.4.2 release; allowing guests to hose the
> >>>>>> hypervisor like this is of course not on.
> >>>>>> 
> >>>>>>  -- Keir
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
> 
> 
-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.