[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Xen on ARM] Possible unhandled SGI bug.



On 29-04-13 14:27, Julien Grall wrote:
> On 04/29/2013 10:39 AM, Ian Campbell wrote:
> 
>> On Sun, 2013-04-28 at 20:02 +0100, Sander Bogaert wrote:
>>> Hi,
>>> 
>>> all previous information can be found in this thread: 
>>> http://lists.xen.org/archives/html/xen-devel/2013-04/msg02772.html
>>>
>>>
>>> 
I've been trying to reproduce this behaviour for the last 2 days,
>>> crashme has been running on the Arndale board for a total of at
>>> least 20 hours. I restarted the process once in a while with
>>> the seed I saw crashing Xen ( 'crashme +2000.4 666 50 2:00:00
>>> 2' ).
>>> 
>>> The version of crashme is 2.4, the one from the Debian Wheezy 
>>> repository. The last seed logged ( needs a SD card write so I
>>> don't know when the last sync was before the crash ) was 43166
>>> 
>>> I have not been able to reproduce the crash. However I'm quite
>>> sure I wasn't imagining things, I really did see Xen crash with
>>> the "SGI 2 Unhandled" error when I was running crashme from
>>> dom0 userspace.
>> 
>> It could be that running crashme was just incidental, and the
>> crash just happened independently. There really ought to be no
>> way for a guest to directly generate a host level SGI and
>> certainly no way for it to generate one with a number of its
>> choosing.
>> 
>>> This seems like a big deal and not being able to reproduce it
>>> is kind of frustrating. So I was wondering if there were any
>>> ideas on how this could have happened? When it did happend I
>>> just rebooted the board so it was in a 'clean' state.
>>> 
>>> Maybe some speculations on a cause could help me reproduce it?
>>> A small explanation on when exactly it should issue sgi's? I
>>> would really really like to get to the bottom of this :-)
>> 
>> The xen.git hypervisor uses two SGIs, GIC_SGI_EVENT_CHECK (==0)
>> and GIC_SGI_DUMP_STATE (==1). Both are issued only via calls to
>> one of send_SGI_{mask,self,allbutself} (or their various
>> wrappers). In practice this means smp_send_event_check_mask() or
>> smp_send_state_dump(). You can verify this by looking at
>> callchains lead to one of the small number of writes to
>> GICD[GICD_SGIR].
>> 
>> Julien added a new SGI in his Arndale tree to call a function on
>> another CPU (not sure what he called it without looking it up,
>> it's #2 though), this would be exercised via smp_call_function()
>> and friends.
>> 
>> About my only theory about how you can have seen a spurious host
>> level SGI==2 is a partial rebuild error -- i.e. make b0rked the
>> build and you got the new version of smp_call_function et al but
>> not the new version of do_sgi(). Unless of course Julien's tree
>> temporarily had code with that behaviour (i.e. added the smp_call
>> stuff before the handler)?
> 
> All this functionality is implemented in a single commit and I
> don't see this commit on you tree (commit
> 5ce4118f5768c6137d58888d57972bdfdf4c9aba).
> 
> GIC_SGI_CALL_FUNCTION is called by on_selected_cpus which is used
> for: - halt a physical cpu - gdb - read clocks keyhandler
> 

I understand I'm using an older version. The reason I'm still using it
is because I hope to reproduce this. I really don't think I 'b0rked'
my build, it's a clean pull & build. So if sgi 2 was sent it wasn't
because of this functionality. I will rerun the test from time to time
maybe it pops up again.

Sander

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.