[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] State of current Xen debugger


  • To: Roger Cruz <roger.cruz@xxxxxxxxxxxxxxxxxxx>, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>, Tim Deegan <Tim.Deegan@xxxxxxxxxx>
  • From: Keir Fraser <keir@xxxxxxx>
  • Date: Tue, 28 Sep 2010 16:30:00 +0100
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 28 Sep 2010 08:31:07 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:user-agent:date:subject:from:to:cc:message-id:thread-topic :thread-index:in-reply-to:mime-version:content-type :content-transfer-encoding; b=Te32yy+YmnDA6J+D9uoxLG7YutqC9B8ZVqvEbpkf5vN4EQZLz4MhmZkeFS2JIyDykp KlI5lBAZkfVa8nKn2EeMcyYUayNGJ3n8OagBedOQIreeiyY4WDW0FNNGxRCtUzF56dPC 4BYBOFxHtS6j/8obXZYGFcYPaZm0/VTKxlFgI=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: ActUIG8kM5lsGIutScKk9RYSVW2ObQABLJOOAr67TxAAAH0rpg==
  • Thread-topic: [Xen-devel] State of current Xen debugger

On 28/09/2010 16:21, "Roger Cruz" <roger.cruz@xxxxxxxxxxxxxxxxxxx> wrote:

> I am still chasing this hard hang in our system with a modified 3.4.2 xen.  I
> have upgraded the BIOS and the problem still exists.  The only thing that so
> far had appeared to work was adding max_cstate=0 but now I have a report where
> it still hung in one customer who had that flag enabled.  The rest of them
> have been successfully running for more than a week with this ³work-around².
> I have isolated the problem to Lenovo with the Centrino processors.  These
> guys will stop the TSC when in C3.
>  
> What I need to really understand is why the NMI/watchdog in Xen is not working
> and causing a crash when the CPU hangs.  I was under the impression that NMIs
> couldn¹t be masked at all.  Is there anyway that Xen could be disabling or
> changing that behavior?   I know the NMI is being driven by a timer set in the
> NMI handler.  Could there be a case where this timer is disabled?   Any ideas
> are welcome!

The NMI counter gets driven by the APIC timer. Perhaps it needs poking
womehow on wakeup from C3? My suggestion for debugging this would be to take
a look at what native Linux does. The NMI perfctr poking logic was all taken
from (rather old now) upstream Linux.

 -- Keir

> Thanks
> Roger R. Cruz
>  
>  
>  
>  
>  
>  
> 
> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
> [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Roger Cruz
> Sent: Tuesday, September 14, 2010 11:55 AM
> To: Dan Magenheimer; Tim Deegan
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] State of current Xen debugger
>  
> Hi Dan,
> 
> I am using 3.4.2 where we have made very minor modifications (some backports,
> for example).
> 
> I have not tried your suggestions.. so I will do that next.. thanks!
> 
> R.
> 
> -----Original Message-----
> From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx]
> Sent: Tue 9/14/2010 11:19 AM
> To: Roger Cruz; Tim Deegan
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] State of current Xen debugger
> 
> A couple of thoughts:
> 
> 
> 
> Have you tried max_cstate=0 (as a Xen boot option)?
> 
> 
> 
> Also, you didn't say what version of Xen you are using but playing around with
> hpet_broadcast (enabling it or force-disabling it as below) might be worth a
> try.
> 
> 
> 
> http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html
> 
> 
> 
> From: Roger Cruz [mailto:roger.cruz@xxxxxxxxxxxxxxxxxxx]
> Sent: Tuesday, September 14, 2010 8:56 AM
> To: Tim Deegan
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] State of current Xen debugger
> 
> 
> 
> Hi Tim,  good to hear from you again
> 
> I had a pretty good inkling that one of you hardcore developers would say that
> :-)  Yes, it is pretty well wedged.  I can cause the problem more rapidly by
> dropping to a single CPU.  When the hang happens, the Xen console is
> completely dead.  None of the special keys work.
> 
> I do have hopes a BIOS upgrade could fix this as a last resort but I want to
> see if at least I can understand the problem.  We have a few different
> machines that are exhibiting similar symptoms so I have to see if I can find a
> work-around without requiring every user to upgrade their BIOS :-(
> 
> Just in case, what debugger have you been using?  Are there recent
> instructions on how to set it up that you can point me to?
> 
> Thanks
> Roger
> 
> 
> -----Original Message-----
> From: Tim Deegan [mailto:Tim.Deegan@xxxxxxxxxx]
> Sent: Tue 9/14/2010 10:30 AM
> To: Roger Cruz
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-devel] State of current Xen debugger
> 
> Hi,
> 
> At 15:22 +0100 on 14 Sep (1284477779), Roger Cruz wrote:
>> I am trying to debug a problem where the hypervisor is hanging hard.
>> Not even the NMI watchdog is triggering a reboot.  So I wanted to hook
>> up a debugger.
> 
> Sorry to bring a counsel of despair but if the NMI watchdog isn't
> working then your chances of getting a working debugger are slim.  It's
> likely that at least one CPU is very very stuck.  Does the 'd' debug key
> work on the serial line when the machine is wedged?
> 
> On a more cheerful note, I've twice seen hard hangs like this that
> turned out to be hardware issues, fixable with BIOS upgrades.
> 
> Cheers,
> 
> Tim.
> 
>> What is the state of the current debuggers out there?
>> Any input on how I should set it up (kdb, gdb, etc) and pointers to a
>> good wiki page are much appreciated.  I did perform a Google search
>> and found some links but I want to hear from the current developers as
>> to what is most stable and useful for debugging this type of hard
>> hang.  I only have a serial port PCI-express card to use as the laptop
>> has no built in port.
> 
> --
> Tim Deegan <Tim.Deegan@xxxxxxxxxx>
> Principal Software Engineer, XenServer Engineering
> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/14/10
> 02:35:00
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/14/10
> 02:35:00
> 
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.851 / Virus Database: 271.1.1/3119 - Release Date: 09/14/10
> 02:35:00
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.