WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

To: xen-devel@xxxxxxxxxxxxxxxxxxx, haitao.shan@xxxxxxxxx
Subject: Re: [Xen-devel] Need help in debugging partially blocked hypervisor
From: Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>
Date: Mon, 2 Nov 2009 10:11:25 +0100
Cc: Keir Fraser <keir.fraser@xxxxxxxxxxxxx>
Delivery-date: Mon, 02 Nov 2009 01:11:54 -0800
Dkim-signature: v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=dietmar.hahn@xxxxxxxxxxxxxx; q=dns/txt; s=s1536b; t=1257153079; x=1288689079; h=from:sender:reply-to:subject:date:message-id:to:cc: mime-version:content-transfer-encoding:content-id: content-description:resent-date:resent-from:resent-sender: resent-to:resent-cc:resent-message-id:in-reply-to: references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:list-owner:list-archive; z=From:=20Dietmar=20Hahn=20<dietmar.hahn@xxxxxxxxxxxxxx> |Subject:=20Re:=20[Xen-devel]=20Need=20help=20in=20debugg ing=20partially=20blocked=20hypervisor|Date:=20Mon,=202 =20Nov=202009=2010:11:25=20+0100|Message-Id:=20<200911021 011.25669.dietmar.hahn@xxxxxxxxxxxxxx>|To:=20xen-devel@li sts.xensource.com,=0D=0A=20haitao.shan@xxxxxxxxx|Cc:=20Ke ir=20Fraser=20<keir.fraser@xxxxxxxxxxxxx>|MIME-Version: =201.0|Content-Transfer-Encoding:=207bit|In-Reply-To:=20< 481ad8630911011712p38b028a9r8078199b176326f3@xxxxxxxxxxxx om>|References:=20<200910301320.40125.dietmar.hahn@xxxxxx itsu.com>=20<C7109568.18E0D%keir.fraser@xxxxxxxxxxxxx>=20 <481ad8630911011712p38b028a9r8078199b176326f3@xxxxxxxxxxx com>; bh=u0Q3ZOkhJpxO5jIDjIgPa2tTjg7Db5sWNHm65s3oFBI=; b=vZIu+ODb2eCfav75thcSZfP5dFOoCGHzH3X5W/Gbw7MRl2Ge0gBnHqIO WtWCIfMO+zqrMiBY3awxJ25II8FAw45yQnlWtRNpnQjKDZWIKFb3KFsMW n+efrzO7TPuNK/rqWwudQ56LNNkhO3mKRo9gS5WoOoCSeuBXFCvVI37gY bm/qi8xpCH1mR2OadJz4Wj2JZCmQ376sc4kvoQpk2ReR3gtTbCzF8IqNw 8kxWnSJHX2ITEtSz+Qr7wqR+d7n/K;
Domainkey-signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=X-SBRSScore:X-IronPort-AV:Received:X-IronPort-AV: Received:Received:From:To:Subject:Date:User-Agent:Cc: References:In-Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Message-Id; b=uE+7edpQv6pG+GURMSusc70UUNc1yLOMa4GBiSd3OJY1RzILI12g+smJ KHWa8yytwEfLoBqU2fErm00nAd8a677hiy4fp57cZuFUw/c+yMoNMBX6P IkJtmzLpG6abnRRQ3WyCUAKQqOoAQfD7oHp8Lr3WPLPlJUlvaTBXcjyD2 HMFDEGWsPvHh/a5ye5a7BI4gGsl22xhb+MreJgeetiYdSutlHpu7JhW6y ENf7yU7QMnXgLv1jZY/YU0DGAyDdx;
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <481ad8630911011712p38b028a9r8078199b176326f3@xxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <200910301320.40125.dietmar.hahn@xxxxxxxxxxxxxx> <C7109568.18E0D%keir.fraser@xxxxxxxxxxxxx> <481ad8630911011712p38b028a9r8078199b176326f3@xxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
User-agent: KMail/1.12.2 (Linux/2.6.27.29-0.1-pae; KDE/4.3.1; i686; ; )
Hi Haitao,

> Can I know how you enabled vPMU on Nehalem? This is not supported in
> current Xen.

http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html

> 
> Concerning vpmu support, I totally agree that we can disable this
> feature by default. If anyone really wants to use it, he can use boot
> options to turn it on.

Yes, that's OK for me.

> I am preparing a patch for that. And I will
> send a patch to enable NHM vpmu together.
> 
> For the problem that Dietmar met, I think I once met this before. Can
> you add some code in vpmu_do_interrupt that sets the counter you are
> using to a value other than zero? Please let me know if that can help.

I don't set the counter to zero. I use 0-val to set the counter.
Actually I testet on Nehalem with
- General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000
- Fixed counter #1 (0x30a) and val=1100000
The thing is that in normal case the overflows of both counters appear
nearly at the same time.
As described I added some extra tracer for xentrace in
core2_vpmu_do_interrupt() so the code looks like:

    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 1. Step
        {
                uint32_t HAHN_l, HAHN_h;
                HAHN_l = (uint32_t) msr_content;
                HAHN_h = (uint32_t) (msr_content >> 32);
                HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. Step
        }
    if ( !msr_content )
        return 0;
    core2_vpmu_cxt->global_ovf_status |= msr_content;
    msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1);
    wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   -> 3. Step

    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 4. Step
        {
        uint32_t HAHN_l, HAHN_h;
        HAHN_l = (uint32_t) msr_content;
        HAHN_h = (uint32_t) (msr_content >> 32);
        HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    -> 5. Step

        rdmsrl(0xc3, msr_content);                        -> 6. Step General 
counter #2
        HAHN_l = (uint32_t) msr_content;
        HAHN_h = (uint32_t) (msr_content >> 32);
        HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
        rdmsrl(0x30a, msr_content);                       -> 7. Step Fixed 
counter #1
        HAHN_l = (uint32_t) msr_content;
        HAHN_h = (uint32_t) (msr_content >> 32);
        HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);
        }

With these tracers I got the following output:

Last good NMI:
Both counter cause the NMI. Resetting works OK.
The counter itself were running further.
2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]  
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ]  
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ]  rdmsrl(0xc3)  -> #2 
general counter
7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ]  rdmsrl(0x30a) -> #1 
fixed counter

NMI from where things goes wrong:
Both counter cause the NMI. Resetting works NOT correct, only for the
general counter!
The general counter (caused the NMI) seems to be stopped!
2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]  
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]  
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]  rdmsrl(0xc3)  -> #2 
general counter
7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]  rdmsrl(0x30a) -> #1 
fixed counter

Wrong NMI:
Only the fixed counter causes the NMI (which was not resetted during NMI 
handling above!)
Both counter seems to be stopped!
2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ]  
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]  
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]  rdmsrl(0xc3)  -> #2 
general counter
7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]  rdmsrl(0x30a) -> #1 
fixed counter

And this state remains forever!
I hope my explanations are understandable ;-)

Until now I can see this behavior only on a Nehalem processor.

Thanks.
Dietmar

> 
> Best Regards
> Shan Haitao
> 
> 2009/10/30 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>:
> > On 30/10/2009 12:20, "Dietmar Hahn" <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
> >
> >> I searched the intel processor spec but couldn't find any help.
> >> So my questions is, what is wrong here?
> >> Can anybody with more knowledge point me in the right direction, what can I
> >> still
> >> do to find the real cause of this?
> >
> > You should probably Cc one of the Intel guys who implemented this stuff --
> > I've added Haitao Shan.
> >
> > Meanwhile I'd be interested to know whether things work okay for you, minus
> > performance counters and the hypervisor hang, if you return immediately from
> > vpmu_initialise(). Really at minimum we need such a fix, perhaps with a boot
> > paremeter to re-enable the feature, for 3.4.2 release; allowing guests to
> > hose the hypervisor like this is of course not on.
> >
> >  -- Keir
> >

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel