[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: S3 resume crash in memguard_guard_stack (stable-4.14)



On 03.10.2020 15:57, Marek Marczykowski-Górecki wrote:
> With this, I get a crash on S3 resume:
> 
> (XEN) Preparing system for ACPI S3 state.
> (XEN) Disabling non-boot CPUs ...
> (XEN) Entering ACPI S3 state.
> (XEN) [VT-D]Passed iommu=no-igfx option.  Disabling IGD VT-d engine.
> (XEN) mce_intel.c:773: MCA Capability: firstbank 0, extended MCE MSR 0, 
> BCAST, CMCI
> (XEN) CPU0 CMCI LVT vector (0xf1) already installed
> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) Enabling non-boot CPUs  ...
> (XEN) ----[ Xen-4.14.1-pre  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d040311090>] memguard_guard_stack+0x7/0x1a5
> (XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
> (XEN) rax: ffff830250ca03f8   rbx: 0000000000000001   rcx: ffff830250cb10b0
> (XEN) rdx: 0000003210739000   rsi: 0000000000000001   rdi: ffff830250ca0000
> (XEN) rbp: ffff830049a6fd70   rsp: ffff830049a6fd40   r8:  0000000000000001
> (XEN) r9:  0000000000000000   r10: 0000000000000001   r11: 0000000000000002
> (XEN) r12: 0000000000010000   r13: 0000000000000000   r14: 0000000000000001
> (XEN) r15: ffff82d040598440   cr0: 000000008005003b   cr4: 00000000003526e0
> (XEN) cr3: 0000000049a5d000   cr2: ffff830250ca03f8
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d040311090> (memguard_guard_stack+0x7/0x1a5):
> (XEN)  c3 48 8d 87 f8 03 00 00 <48> 89 87 f8 03 00 00 48 8d 87 f8 07 00 00 48 
> 89
> (XEN) Xen stack trace from rsp=ffff830049a6fd40:
> (XEN)    ffff82d040321c2e ffff82d040461b68 ffff82d040461b60 ffff82d040461240
> (XEN)    0000000000000001 0000000000000000 ffff830049a6fdb8 ffff82d040221f9c
> (XEN)    ffff830049a6fde0 0000000000000001 0000000000000000 00000000ffffffef
> (XEN)    ffff830049a6fe08 0000000000000001 ffff830250b66000 ffff830049a6fdd0
> (XEN)    ffff82d0402036cf 0000000000000001 ffff830049a6fdf8 ffff82d040203a4d
> (XEN)    0000000000000000 0000000000000001 0000000000000010 ffff830049a6fe28
> (XEN)    ffff82d040203d00 ffff830049a6fef8 0000000000000000 0000000000000003
> (XEN)    0000000000000200 ffff830049a6fe58 ffff82d040270c9a ffff830250139f70
> (XEN)    ffff830250b45000 0000000000000000 0000000000000000 ffff830049a6fe78
> (XEN)    ffff82d040207064 ffff830250b451b8 ffff82d0405781b0 ffff830049a6fe90
> (XEN)    ffff82d04022b7bb ffff82d0405781a0 ffff830049a6fec0 ffff82d04022ba9c
> (XEN)    0000000000000000 ffff82d0405781b0 ffff82d04057ed00 ffff82d040598440
> (XEN)    ffff830049a6fef0 ffff82d0402f33e3 ffff830252b0e000 ffff830250b45000
> (XEN)    ffff830252b0f000 0000000000000000 ffff830049a6fdc8 ffff88818ce029e0
> (XEN)    ffffc900026b7f08 0000000000000003 0000000000000000 0000000000003403
> (XEN)    ffffffff8277a5a8 0000000000000246 0000000000000003 0000000000003403
> (XEN)    0000000000003403 0000000000000000 ffffffff810020ea 0000000000003403
> (XEN)    0000000000000010 deadbeefdeadf00d 0000010000000000 ffffffff810020ea
> (XEN)    000000000000e033 0000000000000246 ffffc900026b7cb8 000000000000e02b
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d040311090>] R memguard_guard_stack+0x7/0x1a5
> (XEN)    [<ffff82d040321c2e>] S smpboot.c#cpu_smpboot_callback+0xe5/0x6d5
> (XEN)    [<ffff82d040221f9c>] F notifier_call_chain+0x6b/0x96
> (XEN)    [<ffff82d0402036cf>] F cpu.c#cpu_notifier_call_chain+0x1b/0x33
> (XEN)    [<ffff82d040203a4d>] F cpu_up+0x5f/0xd5
> (XEN)    [<ffff82d040203d00>] F enable_nonboot_cpus+0xea/0x1fb
> (XEN)    [<ffff82d040270c9a>] F power.c#enter_state_helper+0x152/0x606
> (XEN)    [<ffff82d040207064>] F 
> domain.c#continue_hypercall_tasklet_handler+0x4c/0xb9
> (XEN)    [<ffff82d04022b7bb>] F tasklet.c#do_tasklet_work+0x76/0xa9
> (XEN)    [<ffff82d04022ba9c>] F do_tasklet+0x58/0x8a
> (XEN)    [<ffff82d0402f33e3>] F domain.c#idle_loop+0x40/0x96
> (XEN) 
> (XEN) Pagetable walk from ffff830250ca03f8:
> (XEN)  L4[0x106] = 8000000049a5b063 ffffffffffffffff
> (XEN)  L3[0x009] = 0000000250cae063 ffffffffffffffff
> (XEN)  L2[0x086] = 0000000250cad063 ffffffffffffffff
> (XEN)  L1[0x0a0] = 8000000250ca0161 ffffffffffffffff

Now this one's pretty obvious: The call to memguard_unguard_stack() during
bringing down the APs is conditional (in cpu_smpboot_free()) and hence
memguard_guard_stack() may (at present) not assume the stack is writable
(by ordinary writes, i.e. write_sss_token()). I guess we may want something
like

    if ( stack_base[cpu] == NULL )
    {
        stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
        if ( stack_base[cpu] == NULL )
            goto out;
    }
    else if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
        memguard_unguard_stack(stack_base[cpu]);

in cpu_smpboot_alloc(). But of course the question is whether the
conditions here and there wouldn't better become cpu_has_xen_shstk, since
right now the breakage (afaict) needlessly extends to systems that aren't
CET-capable.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.