[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V5] x86/kexec: Change NMI and MCE handling on kexec path



On 13/12/12 10:59, Tim Deegan wrote:
At 11:35 +0000 on 12 Dec (1355312134), Andrew Cooper wrote:
diff -r ef8c1b607b10 -r 96b068439bc4 xen/arch/x86/crash.c
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -32,41 +32,127 @@

  static atomic_t waiting_for_crash_ipi;
  static unsigned int crashing_cpu;
+static DEFINE_PER_CPU_READ_MOSTLY(bool_t, crash_save_done);

-static int crash_nmi_callback(struct cpu_user_regs *regs, int cpu)
+/* This becomes the NMI handler for non-crashing CPUs, when Xen is crashing. */
+void __attribute__((noreturn)) do_nmi_crash(struct cpu_user_regs *regs)
  {
-    /* Don't do anything if this handler is invoked on crashing cpu.
-     * Otherwise, system will completely hang. Crashing cpu can get
-     * an NMI if system was initially booted with nmi_watchdog parameter.
+    int cpu = smp_processor_id();
+
+    /* nmi_shootdown_cpus() should ensure that this assertion is correct. */
+    ASSERT( cpu != crashing_cpu );
nmi_shootdown_cpus() has a go, but I think it would have to do an atomic
compare-exchange on crashing_cpu to actually be sure that only one CPU
is crashing at a time.  If two CPUs try to lead crashes at the same
time, it will deadlock here, with NMIs disabled on all CPUs.

The old behaviour was also not entirely race-free, but a little more
likey to make progress, as in most cases at least one CPU would see
cpu == crashing_cpu here and return.

Using cmpxchg in nmi_shootdown_cpus() would have the drawback that if
one CPU tried to kexec and wedged, another CPU couldn't then have a go.
Not sure which of all these unlikely scenarios is worst. :)

kexec_common_shutdown() calls one_cpu_only() which gives the impression that nmi_shootdown_cpus() can only be gotten to once. I think this is race free for different pcpus attempting to crash at the same time, but is certainly not safe for a cascade crash on the same pcpu.

As far as I can reason, this race can be worked around by doing a combined atomic set of the KEXEC_IN_PROGRESS flag, and a set of the crashing cpu id. However, there are also other race conditions along the crash path which are addressed in subsequent patches which are still in progress. As the choice here is between two differently nasty race conditions, I would prefer to defer fixing it to later and doing it properly.


diff -r ef8c1b607b10 -r 96b068439bc4 xen/arch/x86/x86_64/entry.S
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -635,11 +635,45 @@ ENTRY(nmi)
          movl  $TRAP_nmi,4(%rsp)
          jmp   handle_ist_exception

+ENTRY(nmi_crash)
+        cli
Aren't interrupts disabled already because we came in through an
interrupt gate?

I was taking the pragmatic approach that in this case, we really have no guarantees about Xen state. Having said that, we will (hopefully) have entered here through a gate, which will be intacked, and the function itself is deliberately safe to interruption. I shall remove it.


+        pushq $0
+        movl $TRAP_nmi,4(%rsp)
+        SAVE_ALL
+        movq %rsp,%rdi
Why?  do_nmi_crash() doesn't actually use any of its arguments.
AFAICT, we could pretty much use do_nmi_crash explicitly in the IDT
entry.

I was going with the DECLARE_TRAP_HANDLER() way of doing things, for consistency with the other handlers. I can change it if you wish, but as this is not a hotpath, I would err on the lazy side and leave it as it is.


+        callq do_nmi_crash /* Does not return */
+        ud2
+
  ENTRY(machine_check)
          pushq $0
          movl  $TRAP_machine_check,4(%rsp)
          jmp   handle_ist_exception

+/* Enable NMIs.  No special register assumptions. Only %rax is not preserved. 
*/
+ENTRY(enable_nmis)
+        movq  %rsp, %rax /* Grab RSP before pushing */
+
+        /* Set up stack frame */
+        pushq $0               /* SS */
+        pushq %rax             /* RSP */
+        pushfq                 /* RFLAGS */
+        pushq $__HYPERVISOR_CS /* CS */
+        leaq  1f(%rip),%rax
+        pushq %rax             /* RIP */
+
+        iretq /* Disable the hardware NMI latch */
+1:
+        retq
This could be made a bit smaller by something like:

        popq  %rcx         /* Remember return address */
        movq  %rsp, %rax   /* Grab RSP before pushing */
        
        /* Set up stack frame */
        pushq $0               /* SS */
        pushq %rax             /* RSP */
        pushfq                 /* RFLAGS */
        pushq $__HYPERVISOR_CS /* CS */
        pushq %rcx             /* RIP */
        
        iretq /* Disable the hardware NMI latch and return */

which also keeps the call/return counts balanced.  But tbh I'm not sure
it's worth it.  And I suspect you'll tell me why it's wrong. :)

I cant see any problem with it.


+
+/* No op trap handler.  Required for kexec crash path.  This is not
+ * declared with the ENTRY() macro to avoid wasted alignment space.
+ */
+.globl trap_nop
+trap_nop:
+        iretq
The commit message says this reuses the iretq in enable_nmis().

Cheers,

Tim.

Ok - I will fix the stale message.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.