[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Xen on ARM] Possible unhandled SGI bug.

On 04/29/2013 10:39 AM, Ian Campbell wrote:

> On Sun, 2013-04-28 at 20:02 +0100, Sander Bogaert wrote:
>> Hi,
>> all previous information can be found in this thread:
>> http://lists.xen.org/archives/html/xen-devel/2013-04/msg02772.html
>> I've been trying to reproduce this behaviour for the last 2 days,
>> crashme has been running on the Arndale board for a total of at least
>> 20 hours. I restarted the process once in a while with the seed I saw
>> crashing Xen ( 'crashme +2000.4 666 50 2:00:00 2' ).
>> The version of crashme is 2.4, the one from the Debian Wheezy
>> repository. The last seed logged ( needs a SD card write so I don't
>> know when the last sync was before the crash ) was 43166
>> I have not been able to reproduce the crash. However I'm quite sure I
>> wasn't imagining things, I really did see Xen crash with the "SGI 2
>> Unhandled" error when I was running crashme from dom0 userspace.
> It could be that running crashme was just incidental, and the crash just
> happened independently. There really ought to be no way for a guest to
> directly generate a host level SGI and certainly no way for it to
> generate one with a number of its choosing.
>> This seems like a big deal and not being able to reproduce it is kind
>> of frustrating. So I was wondering if there were any ideas on how this
>> could have happened? When it did happend I just rebooted the board so
>> it was in a 'clean' state.
>> Maybe some speculations on a cause could help me reproduce it? A small
>> explanation on when exactly it should issue sgi's? I would really
>> really like to get to the bottom of this :-)
> The xen.git hypervisor uses two SGIs, GIC_SGI_EVENT_CHECK (==0) and
> GIC_SGI_DUMP_STATE (==1). Both are issued only via calls to one of
> send_SGI_{mask,self,allbutself} (or their various wrappers). In practice
> this means smp_send_event_check_mask() or smp_send_state_dump(). You can
> verify this by looking at callchains lead to one of the small number of
> writes to GICD[GICD_SGIR].
> Julien added a new SGI in his Arndale tree to call a function on another
> CPU (not sure what he called it without looking it up, it's #2 though),
> this would be exercised via smp_call_function() and friends.
> About my only theory about how you can have seen a spurious host level
> SGI==2 is a partial rebuild error -- i.e. make b0rked the build and you
> got the new version of smp_call_function et al but not the new version
> of do_sgi(). Unless of course Julien's tree temporarily had code with
> that behaviour (i.e. added the smp_call stuff before the handler)?

All this functionality is implemented in a single commit and I don't see this
commit on you tree (commit 5ce4118f5768c6137d58888d57972bdfdf4c9aba).

GIC_SGI_CALL_FUNCTION is called by on_selected_cpus which is used for:
   - halt a physical cpu
   - gdb
   - read clocks keyhandler

Julien Grall

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.