[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Xen on ARM] Possible unhandled SGI bug.
On Sun, 2013-04-28 at 20:02 +0100, Sander Bogaert wrote: > Hi, > > all previous information can be found in this thread: > http://lists.xen.org/archives/html/xen-devel/2013-04/msg02772.html > > I've been trying to reproduce this behaviour for the last 2 days, > crashme has been running on the Arndale board for a total of at least > 20 hours. I restarted the process once in a while with the seed I saw > crashing Xen ( 'crashme +2000.4 666 50 2:00:00 2' ). > > The version of crashme is 2.4, the one from the Debian Wheezy > repository. The last seed logged ( needs a SD card write so I don't > know when the last sync was before the crash ) was 43166 > > I have not been able to reproduce the crash. However I'm quite sure I > wasn't imagining things, I really did see Xen crash with the "SGI 2 > Unhandled" error when I was running crashme from dom0 userspace. It could be that running crashme was just incidental, and the crash just happened independently. There really ought to be no way for a guest to directly generate a host level SGI and certainly no way for it to generate one with a number of its choosing. > This seems like a big deal and not being able to reproduce it is kind > of frustrating. So I was wondering if there were any ideas on how this > could have happened? When it did happend I just rebooted the board so > it was in a 'clean' state. > > Maybe some speculations on a cause could help me reproduce it? A small > explanation on when exactly it should issue sgi's? I would really > really like to get to the bottom of this :-) The xen.git hypervisor uses two SGIs, GIC_SGI_EVENT_CHECK (==0) and GIC_SGI_DUMP_STATE (==1). Both are issued only via calls to one of send_SGI_{mask,self,allbutself} (or their various wrappers). In practice this means smp_send_event_check_mask() or smp_send_state_dump(). You can verify this by looking at callchains lead to one of the small number of writes to GICD[GICD_SGIR]. Julien added a new SGI in his Arndale tree to call a function on another CPU (not sure what he called it without looking it up, it's #2 though), this would be exercised via smp_call_function() and friends. About my only theory about how you can have seen a spurious host level SGI==2 is a partial rebuild error -- i.e. make b0rked the build and you got the new version of smp_call_function et al but not the new version of do_sgi(). Unless of course Julien's tree temporarily had code with that behaviour (i.e. added the smp_call stuff before the handler)? TBH, there probably isn't going to be much we can do about this until we get a repro, so I'd be tempted to ignore it and move on and hope we never see it again. About the only useful things we could do in case it does happen again would be to print othercpu in the panic from do_sgi and to add asserts to send_SGI_* to assert it is sending an SGI which we have defined (not just one which the hardware defines as it asserts now. Could you whip up a patch to do those? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |