Re: [Xen-devel] [PATCH] VPMU issue on Nehalem cpus

To:	"Shan, Haitao" <haitao.shan@xxxxxxxxx>
Subject:	Re: [Xen-devel] [PATCH] VPMU issue on Nehalem cpus
From:	Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
Date:	Mon, 22 Nov 2010 07:36:25 +0100
Cc:	"xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxxxx>
Delivery-date:	Sun, 21 Nov 2010 22:37:34 -0800
Dkim-signature:	v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=dietmar.hahn@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1290407789; x=1321943789; h=from:to:subject:date:cc:references:in-reply-to: mime-version:content-transfer-encoding:message-id; z=From:=20Dietmar=20Hahn=20<dietmar.hahn@xxxxxxxxxxxxxx> \|To:=20"Shan,=20Haitao"=20<haitao.shan@xxxxxxxxx> \|Subject:=20Re:=20[Xen-devel]=20[PATCH]=20VPMU=20issue=20 on=20Nehalem=20cpus\|Date:=20Mon,=2022=20Nov=202010=2007:3 6:25=20+0100\|Cc:=20Jan=20Beulich=20<JBeulich@xxxxxxxxxx>, =0D=0A=20"xen-devel@xxxxxxxxxxxxxxxxxxx"=20<xen-devel@lis ts.xensource.com>\|References:=20<201011191129.07773.dietm ar.hahn@xxxxxxxxxxxxxx>=20<4CE668580200007800023533@xxxxx d2.novell.com>=20<04F972F38B3C4E4E91C4697DA8BF9F524E1739B EEB@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>\|In-Reply-To:=20<04F972F 38B3C4E4E91C4697DA8BF9F524E1739BEEB@xxxxxxxxxxxxxxxxxxxxx tel.com>\|MIME-Version:=201.0\|Content-Transfer-Encoding: =207bit\|Message-Id:=20<201011220736.26457.dietmar.hahn@ts .fujitsu.com>; bh=UYjZGXIps+Io/CCf2e4gzyYo7N6hKkrRwAOnLlJBKv8=; b=lY04aPa9DzFS8ZQM61ZT9qEO5uTwU3Xq3VyRzXV90SkLjBtKJwyu7EIF 160T5rDmeC2mbmrPk1fWlWyLuLnQgP+5ohYzaVuYvoeE4lnRG8S54bZef EgEifSDHqU4bHd5+U/LtuorDhAvMaB8WGL9Lwjhd+Y4BARkTH1vn0XH2w 3+FikfhXnUYpFgnTM0WEv/AMERbeDtriGJHykfbPFOn4/sW4f92G3Zc2t Ugdg+5U5ra6FPOgjjJAxTfxRk8XOs;
Domainkey-signature:	s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Subject:Date:User-Agent:Cc: References:In-Reply-To:X-KMail-Markup:MIME-Version: Content-Type:Content-Transfer-Encoding:Message-Id; b=oc4AX4jxwV6QCWG1euLpRDObUqwr9rQ/43+HNTKkvBcEuXHTPRkXgB78 KbAVVp7PliletY/Dx0XdhGPk9JhzvQC1A6PGi12QzHy/MxARc9vsMPO9U 7ot6JzM4iaEJZDrM729IgEwl6Lc5HM1XvuGCxQvFd8DhNpCn+0/hT5P5I euwrLc1zrv+yAuwBoP5VeKOt9KsgOcZhJGCwZeXq6KInA2QNNy08K+5wk pFOKWU/4c12r4hmkWSDUfsM8EgUcl;
Envelope-to:	www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to:	<04F972F38B3C4E4E91C4697DA8BF9F524E1739BEEB@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help:	<mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id:	Xen developer discussion <xen-devel.lists.xensource.com>
List-post:	<mailto:xen-devel@lists.xensource.com>
List-subscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe:	<http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References:	<201011191129.07773.dietmar.hahn@xxxxxxxxxxxxxx> <4CE668580200007800023533@xxxxxxxxxxxxxxxxxx> <04F972F38B3C4E4E91C4697DA8BF9F524E1739BEEB@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender:	xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent:	KMail/1.13.5 (Linux/2.6.34.7-0.5-xen; KDE/4.5.3; x86_64; ; )

Hi Haitao,

Am 20.11.2010 schrieb ""Shan, Haitao" <haitao.shan@xxxxxxxxx>":

> Hi, Jan,

> The actual handler core2_vpmu_do_interrupt is never called under a NMI handler context. The reason is that I don't want to place such an interrupt on behalf of guest to be of so high a priority. So, even guest programs the virtual APIC to use NMI, the underlying HW does not. It uses a vector (0xF8? I don't remember clearly).

> This issue this patch solves is likely a HW bug but I have not find any documented errata. On some NHM processors, when a PMI is received, you will observe the counter that triggers this interrupt is zero (which means it has just gone overflowed). If you unmask the PMI sources (as you might know, PMI gets automatically masked when received), you will immediately get another interrupt. This is how you get an interrupt loop here (not NMI loop, but PMI loop).

> The reason why native oprofile works is that oprofile actually reprogram the counter before unmask the interrupt. In our VPMU implementation in Xen, the PMI handler does nothing but pending an interrupt to guest, then it unmasks the PMI and return.

> I find that reprograming the counter to be 1 works around this issue (any number that is *not* zero should work actually).

> Another working around would be only unmasking the real PMI when guests unmask its virtual PMIs. This work around is not very promising, since by the time guests unmask the virtual PMIs, the vcpu might be migrated to other CPUs. You need complex tracking and an IPI to do the right unmask on the correct physical CPUs.

> Hope I can explain the whole matter clearly here.

> BTW: As I observed on other processors, when a PMI is received, the counter is never zero (some values just slightly greater than zero).

> Shan Haitao

very good explanation, many thanks!

Dietmar.

> -----Original Message-----

> From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx]

> Sent: Friday, November 19, 2010 7:07 PM

> To: Dietmar Hahn

> Cc: Shan, Haitao; xen-devel@xxxxxxxxxxxxxxxxxxx

> Subject: Re: [Xen-devel] [PATCH] VPMU issue on Nehalem cpus

> >>> On 19.11.10 at 11:29, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote:

> > +/*

> > + * QUIRK to workaround an issue on Nehalem processors currently seen

> > + * on family 6 cpus E5520 (model 26) and X7542 (model 46).

> > + * The issue leads to endless NMI loops on the processor.

> > + * If a counter triggers an NMI and while the NMI handler is running another

> > + * counter overflows the second counter triggers endless new NMIs.

> > + * A solution is to read all flagged counters and if the value is 0 write

> > + * 1 into it.

> > + */

> Two things I don't understand here: One is that I can't see how

> from the NMI handler control would get to

> core2_vpmu_do_interrupt() - afaics, this gets called only in the

> context of the (vectored) smp_pmu_apic_interrupt(). The other

> is that if nested interrupts occur, how would you prevent this

> by writing ones into zero counters? That is, in the best case I

> could see this shrinking the window within which unintended

> nested interrupts would occur. Or is it that the secondary

> interrupts only occur after the first one returned? Is this (mis-)

> behavior documented somewhere?

> > +static int is_nmi_quirk;

> bool_t __read_mostly?

> > +

> > +static void check_nmi_quirk(void)

> > +{

> > + u8 family = current_cpu_data.x86;

> > + u8 cpu_model = current_cpu_data.x86_model;

> > + is_nmi_quirk = 0;

> > + if ( family == 6 )

> > + {

> > + if ( cpu_model == 46 || cpu_model == 26 )

> > + is_nmi_quirk = 1;

> > + }

> > +}

> > +

> > +static int core2_get_pmc_count(void);

> > +static void handle_nmi_quirk(u64 msr_content)

> > +{

> > + int num_gen_pmc = core2_get_pmc_count();

> > + int num_fix_pmc = 3;

> > + int i;

> > + u64 val;

> > +

> > + if ( !is_nmi_quirk )

> > + return;

> > +

> > + val = msr_content & ((1 << num_gen_pmc) - 1);

> What's the point of masking if the subsequent loop looks at the

> bottom so many bits only anyway?

> > + for ( i = 0; i < num_gen_pmc; i++ )

> > + {

> > + if ( val & 0x1 )

> > + {

> > + u64 cnt;

> > + rdmsrl(MSR_P6_PERFCTR0 + i, cnt);

> > + if ( cnt == 0 )

> > + wrmsrl(MSR_P6_PERFCTR0 + i, 1);

> > + }

> > + val >>= 1;

> > + }

> > + val = (msr_content >> 32) & ((1 << num_fix_pmc) - 1);

> Same here.

> > + for ( i = 0; i < num_fix_pmc; i++ )

> > + {

> > + if ( val & 0x1 )

> > + {

> > + u64 cnt;

> > + rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);

> > + if ( cnt == 0 )

> > + wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);

> > + }

> > + val >>= 1;

> > + }

> > +}

> > +

> > +#define CHECK_HANDLE_NMI_QUIRK(msr_content) \

> > + if ( is_nmi_quirk ) \

> > + handle_nmi_quirk(msr_content);

> > +

> Why do you need a macro here if you use it only once?

> > u32 core2_counters_msr[] = {

> > MSR_CORE_PERF_FIXED_CTR0,

> > MSR_CORE_PERF_FIXED_CTR1,

> > @@ -494,6 +558,9 @@

> > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);

> > if ( !msr_content )

> > return 0;

> > +

> > + CHECK_HANDLE_NMI_QUIRK(msr_content)

> > +

> > core2_vpmu_cxt->global_ovf_status |= msr_content;

> > msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1);

> > wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);

> Jan

Dietmar Hahn

TSP ES&S SWE OS

FUJITSU

Fujitsu Technology Solutions

Domagkstraße 28, D-80807 München, Germany

Tel: +49 (89) 3222 2952

Email: dietmar.hahn@xxxxxxxxxxxxxx

Web: http://ts.fujitsu.com

Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

WARNING - OLD ARCHIVES

xen-devel

Re: [Xen-devel] [PATCH] VPMU issue on Nehalem cpus