[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] NMI with SMP domain causing machine to reboot



  Keir

  Thanks for your reply.
  I don't think the problem is caused by not properly
  reseting CPU1's perf counter. I can see that the number of
  NMIs being generated are similar both for CPU0 and CPU1,
  and both CPUs perf counters are being programmed in the
  exact same way.
  (The command "xenpmc -s" enables me to see the number of NMIs
generated)
  Moreover, when we have multiple non-SMP domains running
  on both CPUs, this problem does not happen. 
  Sharing of MSRs between hyperthreads should not be the problem
  either, since my machine has 2 physical CPUs and hyperthreading is
  disabled in the BIOS.(ie. CPU0 and CPU1 are distinct physical 
  CPUs)

  It seems that there is something wrong or some race condition
  introduced by SMPs domains. Any idea of what is different in Xen
  (maybe interrupt handling) when you have SMP domains? 
  
  Any chance you could try reproducing this behavior in one of 
  your machines?
  Can you think of any situation that would cause the machine to
  reboot without printing any error message in the serial console?
  Any help is deeply appreciate since I loosing hope I will 
  be able to nail this down by myself.
  It is always possible possible that I am doing something wrong,
  but at this point the code left is not doing much and I am
  starting to suspect the problem lies somewhere else in Xen.
  In this case I would desperately need someone else help.
  
  Thanks

  Renato  

>> -----Original Message-----
>> From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx] 
>> Sent: Friday, September 09, 2005 1:57 AM
>> To: Santos, Jose Renato G
>> Cc: Turner, Yoshio; xen-devel@xxxxxxxxxxxxxxxxxxx; G John Janakiraman
>> Subject: Re: [Xen-devel] NMI with SMP domain causing machine 
>> to reboot
>> 
>> 
>> 
>> On 8 Sep 2005, at 20:33, Santos, Jose Renato G wrote:
>> 
>> >   I have spend most of the last weeks trying to nail down 
>> a nasty bug
>> >   that is preventing me to release xenoprof for SMP domains.
>> >   The bug is non-deterministic and when it happens the machine just
>> >   reboots with no message or warning on the serial console.
>> >   This made the debugging process painfull and slow.
>> 
>> Hard to say from the code, but maybe it's somethign to do with 
>> hyperthreading? The performance counter MSRs are shared in a 
>> weird way 
>> between hyperthreads. Maybe you're not properly resetting 
>> CPU1's perf 
>> counter and ending up with an NMI storm?
>> 
>>   -- Keir
>> 
>> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.