Xen project Mailing List

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>

Date: Tue, 3 Nov 2009 10:03:32 +0100

Cc: "Shan, Haitao" <haitao.shan@xxxxxxxxx>, Keir Fraser <keir.fraser@xxxxxxxxxxxxx>

Delivery-date: Tue, 03 Nov 2009 01:04:03 -0800

Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Subject:Date:User-Agent:Cc: References:In-Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Message-Id; b=ILVfX/ofWhhogyeUGEbMopBmdTVU0KQeNMQI6ZPJoWtnv0tfBZMKINJX 8DAe7gEjl9slgmHNbfypDo3VKzkoPKvGZDtZY0+i3KCXsMabWRDLlwGkb 9s3KTOn6ru4N//AEeTx080vcvH9iuO5ZMpPPYJyOqo6c5Ja7lVsFWh2g4 9KwRvOwcWOMWcnx3YtP72o1TdVJEzma3A5v8oPR2XybPwYDKtQvgFrR43 QViMcl1I8ZoZE3jyFTama9zO/ap0b;

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

> No problem. > Can you help to test? I have no test box at hand now, which might cause delay. > Sure :-) Dietmar. > Haitao > > > Dietmar Hahn wrote: > >> I suspect the guest will reproduce this PMI loop if guest behaves as > >> you said in this email. But as far as I know, VTune and oprofile do > >> not behave like that. > >> Of course, this approach is still like workaround (unless I get > >> comfirm that HW requires to do so). This approach is preferrable > >> because it does not change the contents of MSRs. Thus, we have no > >> impact on guest software that does rely on reading the correct value > >> from HW. Approach 1 existed just because we knew that in event-based > >> sampling, counter value on receiving PMI was not used by > >> OProfile/VTune at all and it was safe to set the counter to some > >> non-zero value. > >> > >> Haitao > >> > > > > OK, then will you send a patch? > > Dietmar. > > > >> > >> Dietmar Hahn wrote: > >>> Please see below. > >>> > >>>> See my comments embedded. :) > >>>> > >>>> Haitao > >>>> > >>>> > >>>> Dietmar Hahn wrote: > >>>>> The conclusion is, that this seems to be a workaround for the > >>>>> endless NMI loop. PMI's are a very rarely event and this should > >>>>> not raise a performance problem. > >>>> I totally agree that this is only a workaround for approach 1. > >>>> > >>>>> > >>>>> I didn't try your second approach > >>>>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask > >>>>>> *physical PMI* when guest vcpu unmasks virtual PMI. but I have > >>>>>> some question. > >>>>> > >>>>> - What if the 'physical PMI' is not unmasked in vpmu_do_interrupt > >>>>> and a watchdog NMI would occur before the domU unmasks it? > >>>> I think the second NMI will be lost. > >>>> > >>>>> - Is it possible that after handling the NMI (and not unmasking) > >>>>> another domU got running on this CPU and therefore PMI's got > >>>>> lost? > >>>> LVTPC entry in physical local APIC is save/restored by Xen on VCPU > >>>> switches. So unmasking (or not) of PMI of one vcpu should have no > >>>> impact on another vcpu. When developing vPMU, I treated as vPMU > >>>> context both PMU MSRs and LVTPC entry in local APIC. vPMU context > >>>> is save/restored on physical HW when vcpus is scheduled, either in > >>>> an active save/restore manner or a lazy one (depending on the PMU > >>>> usage at the time of switch). > >>>> > >>>>> > >>>>> But the real cause of the problem is unknown. As said I saw this > >>>>> only on Nehalem. Maybe there is a problem together with the > >>>>> hardware? Perhaps your hardware colleagues know something more ;-) > >>>> When I found this problem, I just thought it might be a corner case > >>>> that only happens on my box (of course, I only see this in NHM, > >>>> too). I will try to pin HW guy to see if any explanation, since it > >>>> is proven to be a general problem on NHM. > >>>> > >>>> But before everything is clear, I think approach 2 is a better > >>>> solution now. > >>> > >>> What would be the effect if the guest unmasks the PMI (which leads > >>> to unmasking the 'physical PMI') but doesn't reset the counter to a > >>> value != 0? Is the guest able to produce the nmi endless loop? > >>> > >>> Dietmar. > >>> > >>>> > >>>>> > >>>>> Thanks > >>>>> Dietmar > >>>>> > >>>>>> > >>>>>>> > >>>>>>> When I met this problem, I remember that I tried two approaches: > >>>>>>> 1> Setting the counter to non-zero before unmasking PMI in > >>>>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from > >>>>>>> vpmu_do_interrupt and unmask *physical PMI* when guest vcpu > >>>>>>> unmasks virtual PMI. > >>>>>>> I remember that approach 2 can fix this issue. But I do not > >>>>>>> remember the result of approach 1, since I met this about one > >>>>>>> year ago. It is my understanding that approach 2 is quite same > >>>>>>> as approach 1, since normally guest will set the counter to some > >>>>>>> negative value (for example, -100000) before unmasking virtual > >>>>>>> PMI. However, approach 2 looks cleaner and more reasonable. > >>>>>>> > >>>>>>> Can you have a try and let me know the result? If both can not > >>>>>>> work, there might be some problems that I have not met before. > >>>>>>> > >>>>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. > >>>>>>> So, there is no need for me to work on that now. :) > >>>>>>> > >>>>>>> Haitao > >>>>>>> > >>>>>>> > >>>>>>> Dietmar Hahn wrote: > >>>>>>>> Hi Haitao, > >>>>>>>> > >>>>>>>>> Can I know how you enabled vPMU on Nehalem? This is not > >>>>>>>>> supported in current Xen. > >>>>>>>> > >>>>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Concerning vpmu support, I totally agree that we can disable > >>>>>>>>> this feature by default. If anyone really wants to use it, he > >>>>>>>>> can use boot options to turn it on. > >>>>>>>> > >>>>>>>> Yes, that's OK for me. > >>>>>>>> > >>>>>>>>> I am preparing a patch for that. And I will > >>>>>>>>> send a patch to enable NHM vpmu together. > >>>>>>>>> > >>>>>>>>> For the problem that Dietmar met, I think I once met this > >>>>>>>>> before. Can you add some code in vpmu_do_interrupt that sets > >>>>>>>>> the counter you are using to a value other than zero? Please > >>>>>>>>> let me know if that can help. > >>>>>>>> > >>>>>>>> I don't set the counter to zero. I use 0-val to set the > >>>>>>>> counter. Actually I testet on Nehalem with > >>>>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and > >>>>>>>> val=1100000 > >>>>>>>> - Fixed counter #1 (0x30a) and val=1100000 > >>>>>>>> The thing is that in normal case the overflows of both counters > >>>>>>>> appear nearly at the same time. As described I added some extra > >>>>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code > >>>>>>>> looks like: > >>>>>>>> > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. > >>>>>>>> Step { uint32_t HAHN_l, HAHN_h; > >>>>>>>> HAHN_l = (uint32_t) msr_content; > >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. > >>>>>>>> Step > >>>>>>>> } if ( !msr_content ) return 0; > >>>>>>>> core2_vpmu_cxt->global_ovf_status |= msr_content; > >>>>>>>> msr_content = 0xC000000700000000 | ((1 << > >>>>>>>> core2_get_pmc_count()) - 1); > >>>>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. > >>>>>>>> Step > >>>>>>>> > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. > >>>>>>>> Step { uint32_t HAHN_l, HAHN_h; > >>>>>>>> HAHN_l = (uint32_t) msr_content; > >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> > >>>>>>>> 5. Step > >>>>>>>> > >>>>>>>> rdmsrl(0xc3, msr_content); -> 6. > >>>>>>>> Step General counter #2 HAHN_l = (uint32_t) > >>>>>>>> msr_content; HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); > >>>>>>>> rdmsrl(0x30a, msr_content); -> 7. > >>>>>>>> Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; > >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } > >>>>>>>> > >>>>>>>> With these tracers I got the following output: > >>>>>>>> > >>>>>>>> Last good NMI: > >>>>>>>> Both counter cause the NMI. Resetting works OK. > >>>>>>>> The counter itself were running further. > >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] > >>>>>>>> rdmsrl(0xc3) -> #2 general counter > >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] > >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter > >>>>>>>> > >>>>>>>> NMI from where things goes wrong: > >>>>>>>> Both counter cause the NMI. Resetting works NOT correct, only > >>>>>>>> for the general counter! The general counter (caused the NMI) > >>>>>>>> seems to be stopped! > >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] > >>>>>>>> rdmsrl(0xc3) -> #2 general counter > >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] > >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter > >>>>>>>> > >>>>>>>> Wrong NMI: > >>>>>>>> Only the fixed counter causes the NMI (which was not resetted > >>>>>>>> during NMI handling above!) Both counter seems to be stopped! > >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] > >>>>>>>> rdmsrl(0xc3) -> #2 general counter > >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] > >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter > >>>>>>>> > >>>>>>>> And this state remains forever! > >>>>>>>> I hope my explanations are understandable ;-) > >>>>>>>> > >>>>>>>> Until now I can see this behavior only on a Nehalem processor. > >>>>>>>> > >>>>>>>> Thanks. > >>>>>>>> Dietmar > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Best Regards > >>>>>>>>> Shan Haitao > >>>>>>>>> > >>>>>>>>> 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>: > >>>>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn" > >>>>>>>>>> <dietmar.hahn@xxxxxxxxxxxxxx> wrote: > >>>>>>>>>> > >>>>>>>>>>> I searched the intel processor spec but couldn't find any > >>>>>>>>>>> help. So my questions is, what is wrong here? > >>>>>>>>>>> Can anybody with more knowledge point me in the right > >>>>>>>>>>> direction, what can I still do to find the real cause of > >>>>>>>>>>> this? > >>>>>>>>>> > >>>>>>>>>> You should probably Cc one of the Intel guys who implemented > >>>>>>>>>> this stuff -- I've added Haitao Shan. > >>>>>>>>>> > >>>>>>>>>> Meanwhile I'd be interested to know whether things work okay > >>>>>>>>>> for you, minus performance counters and the hypervisor hang, > >>>>>>>>>> if you return immediately from vpmu_initialise(). Really at > >>>>>>>>>> minimum we need such a fix, perhaps with a boot paremeter to > >>>>>>>>>> re-enable the feature, for 3.4.2 release; allowing guests to > >>>>>>>>>> hose the hypervisor like this is of course not on. > >>>>>>>>>> > >>>>>>>>>> -- Keir > >>>> _______________________________________________ > >>>> Xen-devel mailing list > >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx > >>>> http://lists.xensource.com/xen-devel > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@xxxxxxxxxxxxxxxxxxx > >> http://lists.xensource.com/xen-devel > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > > -- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.