WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] [PATCH] VPMU issue on Nehalem cpus

To: "Dietmar Hahn" <dietmar.hahn@xxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH] VPMU issue on Nehalem cpus
From: "Jan Beulich" <JBeulich@xxxxxxxxxx>
Date: Fri, 19 Nov 2010 11:06:48 +0000
Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Haitao Shan <haitao.shan@xxxxxxxxx>
Delivery-date: Fri, 19 Nov 2010 03:07:48 -0800
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <201011191129.07773.dietmar.hahn@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <201011191129.07773.dietmar.hahn@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>>> On 19.11.10 at 11:29, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> +/*
> + * QUIRK to workaround an issue on Nehalem processors currently seen
> + * on family 6 cpus E5520 (model 26) and X7542 (model 46).
> + * The issue leads to endless NMI loops on the processor.
> + * If a counter triggers an NMI and while the NMI handler is running another
> + * counter overflows the second counter triggers endless new NMIs.
> + * A solution is to read all flagged counters and if the value is 0 write
> + * 1 into it.
> + */

Two things I don't understand here: One is that I can't see how
from the NMI handler control would get to
core2_vpmu_do_interrupt() - afaics, this gets called only in the
context of the (vectored) smp_pmu_apic_interrupt(). The other
is that if nested interrupts occur, how would you prevent this
by writing ones into zero counters? That is, in the best case I
could see this shrinking the window within which unintended
nested interrupts would occur. Or is it that the secondary
interrupts only occur after the first one returned? Is this (mis-)
behavior documented somewhere?

> +static int is_nmi_quirk;

bool_t __read_mostly?

> +
> +static void check_nmi_quirk(void)
> +{
> +    u8 family = current_cpu_data.x86;
> +    u8 cpu_model = current_cpu_data.x86_model;
> +    is_nmi_quirk = 0;
> +    if ( family == 6 )
> +    {
> +        if ( cpu_model == 46 || cpu_model == 26 )
> +            is_nmi_quirk = 1;
> +    }
> +}
> +
> +static int core2_get_pmc_count(void);
> +static void handle_nmi_quirk(u64 msr_content)
> +{
> +    int num_gen_pmc = core2_get_pmc_count();
> +    int num_fix_pmc  = 3;
> +    int i;
> +    u64 val;
> +
> +    if ( !is_nmi_quirk )
> +        return;
> +
> +    val = msr_content & ((1 << num_gen_pmc) - 1);

What's the point of masking if the subsequent loop looks at the
bottom so many bits only anyway?

> +    for ( i = 0; i < num_gen_pmc; i++ )
> +    {
> +        if ( val & 0x1 )
> +        {
> +            u64 cnt;
> +            rdmsrl(MSR_P6_PERFCTR0 + i, cnt);
> +            if ( cnt == 0 )
> +                wrmsrl(MSR_P6_PERFCTR0 + i, 1);
> +        }
> +        val >>= 1;
> +    }
> +    val = (msr_content >> 32) & ((1 << num_fix_pmc) - 1);

Same here.

> +    for ( i = 0; i < num_fix_pmc; i++ )
> +    {
> +        if ( val & 0x1 )
> +        {
> +            u64 cnt;
> +            rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);
> +            if ( cnt == 0 )
> +                wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);
> +        }
> +        val >>= 1;
> +    }
> +}
> +
> +#define CHECK_HANDLE_NMI_QUIRK(msr_content) \
> +            if ( is_nmi_quirk )            \
> +                handle_nmi_quirk(msr_content);
> +

Why do you need a macro here if you use it only once?

>  u32 core2_counters_msr[] =   {
>      MSR_CORE_PERF_FIXED_CTR0,
>      MSR_CORE_PERF_FIXED_CTR1,
> @@ -494,6 +558,9 @@
>      rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
>      if ( !msr_content )
>          return 0;
> +
> +    CHECK_HANDLE_NMI_QUIRK(msr_content)
> +
>      core2_vpmu_cxt->global_ovf_status |= msr_content;
>      msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1);
>      wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel