|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH V5] x86/kexec: Change NMI and MCE handling on kexec path
At 11:35 +0000 on 12 Dec (1355312134), Andrew Cooper wrote:
> diff -r ef8c1b607b10 -r 96b068439bc4 xen/arch/x86/crash.c
> --- a/xen/arch/x86/crash.c
> +++ b/xen/arch/x86/crash.c
> @@ -32,41 +32,127 @@
>
> static atomic_t waiting_for_crash_ipi;
> static unsigned int crashing_cpu;
> +static DEFINE_PER_CPU_READ_MOSTLY(bool_t, crash_save_done);
>
> -static int crash_nmi_callback(struct cpu_user_regs *regs, int cpu)
> +/* This becomes the NMI handler for non-crashing CPUs, when Xen is crashing.
> */
> +void __attribute__((noreturn)) do_nmi_crash(struct cpu_user_regs *regs)
> {
> - /* Don't do anything if this handler is invoked on crashing cpu.
> - * Otherwise, system will completely hang. Crashing cpu can get
> - * an NMI if system was initially booted with nmi_watchdog parameter.
> + int cpu = smp_processor_id();
> +
> + /* nmi_shootdown_cpus() should ensure that this assertion is correct. */
> + ASSERT( cpu != crashing_cpu );
nmi_shootdown_cpus() has a go, but I think it would have to do an atomic
compare-exchange on crashing_cpu to actually be sure that only one CPU
is crashing at a time. If two CPUs try to lead crashes at the same
time, it will deadlock here, with NMIs disabled on all CPUs.
The old behaviour was also not entirely race-free, but a little more
likey to make progress, as in most cases at least one CPU would see
cpu == crashing_cpu here and return.
Using cmpxchg in nmi_shootdown_cpus() would have the drawback that if
one CPU tried to kexec and wedged, another CPU couldn't then have a go.
Not sure which of all these unlikely scenarios is worst. :)
> diff -r ef8c1b607b10 -r 96b068439bc4 xen/arch/x86/x86_64/entry.S
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -635,11 +635,45 @@ ENTRY(nmi)
> movl $TRAP_nmi,4(%rsp)
> jmp handle_ist_exception
>
> +ENTRY(nmi_crash)
> + cli
Aren't interrupts disabled already because we came in through an
interrupt gate?
> + pushq $0
> + movl $TRAP_nmi,4(%rsp)
> + SAVE_ALL
> + movq %rsp,%rdi
Why? do_nmi_crash() doesn't actually use any of its arguments.
AFAICT, we could pretty much use do_nmi_crash explicitly in the IDT
entry.
> + callq do_nmi_crash /* Does not return */
> + ud2
> +
> ENTRY(machine_check)
> pushq $0
> movl $TRAP_machine_check,4(%rsp)
> jmp handle_ist_exception
>
> +/* Enable NMIs. No special register assumptions. Only %rax is not
> preserved. */
> +ENTRY(enable_nmis)
> + movq %rsp, %rax /* Grab RSP before pushing */
> +
> + /* Set up stack frame */
> + pushq $0 /* SS */
> + pushq %rax /* RSP */
> + pushfq /* RFLAGS */
> + pushq $__HYPERVISOR_CS /* CS */
> + leaq 1f(%rip),%rax
> + pushq %rax /* RIP */
> +
> + iretq /* Disable the hardware NMI latch */
> +1:
> + retq
This could be made a bit smaller by something like:
popq %rcx /* Remember return address */
movq %rsp, %rax /* Grab RSP before pushing */
/* Set up stack frame */
pushq $0 /* SS */
pushq %rax /* RSP */
pushfq /* RFLAGS */
pushq $__HYPERVISOR_CS /* CS */
pushq %rcx /* RIP */
iretq /* Disable the hardware NMI latch and return */
which also keeps the call/return counts balanced. But tbh I'm not sure
it's worth it. And I suspect you'll tell me why it's wrong. :)
> +
> +/* No op trap handler. Required for kexec crash path. This is not
> + * declared with the ENTRY() macro to avoid wasted alignment space.
> + */
> +.globl trap_nop
> +trap_nop:
> + iretq
The commit message says this reuses the iretq in enable_nmis().
Cheers,
Tim.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |