[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] mm/page_alloc: make bootscrub happen in idle-loop



On Thu, Nov 08, 2018 at 02:48:40PM +0000, Sergey Dyasli wrote:
> (CCing Roger)
> 
> On 08/11/2018 11:07, Andrew Cooper wrote:
> > On 08/11/18 10:31, Jan Beulich wrote:
> >>>>> On 07.11.18 at 19:20, <andrew.cooper3@xxxxxxxxxx> wrote:
> >>> On 09/10/18 16:21, Sergey Dyasli wrote:
> >>>> Scrubbing RAM during boot may take a long time on machines with lots
> >>>> of RAM. Add 'idle' option to bootscrub which marks all pages dirty
> >>>> initially so they will eventually be scrubbed in idle-loop on every
> >>>> online CPU.
> >>>>
> >>>> It's guaranteed that the allocator will return scrubbed pages by doing
> >>>> eager scrubbing during allocation (unless MEMF_no_scrub was provided).
> >>>>
> >>>> Use the new 'idle' option as the default one.
> >>>>
> >>>> Signed-off-by: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>
> >>> This patch reliably breaks boot, although its not immediately obvious how:
> >>>
> >>> (d9) (XEN) mcheck_poll: Machine check polling timer started.
> >>> (d9) (XEN) xenoprof: Initialization failed. Intel processor family 6 
> >>> model 
> >>> 60 is not supported
> >>> (d9) (XEN) Dom0 has maximum 400 PIRQs
> >>> (d9) (XEN) ----[ Xen-4.12-unstable  x86_64  debug=y   Not tainted ]----
> >>> (d9) (XEN) CPU:    0
> >>> (d9) (XEN) RIP:    e008:[<ffff82d080440ddb>] 
> >>> setup.c#cmdline_cook+0x1d/0x77
> >>> (d9) (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor
> >>> (d9) (XEN) rax: ffff82d080406bdc   rbx: ffff8300c2c2c2c2   rcx: 
> >>> 0000000000000000
> >>> (d9) (XEN) rdx: 00000007c7ffffff   rsi: ffff83000045c24b   rdi: 
> >>> ffff83000045c24b
> >>> (d9) (XEN) rbp: ffff82d0804b7da8   rsp: ffff82d0804b7d98   r8:  
> >>> ffff83003f057000
> >>> (d9) (XEN) r9:  7fffffffffffffff   r10: 0000000000000000   r11: 
> >>> 0000000000000001
> >>> (d9) (XEN) r12: ffff83003f0d8100   r13: 0000000000000000   r14: 
> >>> ffff82d0805f33d0
> >>> (d9) (XEN) r15: 0000000000000002   cr0: 000000008005003b   cr4: 
> >>> 00000000001526e0
> >>> (d9) (XEN) cr3: 000000003fea7000   cr2: ffff8300c2c2c2c2
> >>> (d9) (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 
> >>> 0000000000000000
> >>> (d9) (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> >>> (d9) (XEN) Xen code around <ffff82d080440ddb> 
> >>> (setup.c#cmdline_cook+0x1d/0x77):
> >>> (d9) (XEN)  05 5e fc ff 48 0f 44 d8 <80> 3b 20 75 09 48 83 c3 01 80 3b 20 
> >>> 74 f7 80 3d
> >>> (d9) (XEN) Xen stack trace from rsp=ffff82d0804b7d98:
> >>> [...]
> >>> (d9) (XEN) Xen call trace:
> >>> (d9) (XEN)    [<ffff82d080440ddb>] setup.c#cmdline_cook+0x1d/0x77
> >>> (d9) (XEN)    [<ffff82d080443b7f>] __start_xen+0x259c/0x292d
> >>> (d9) (XEN)    [<ffff82d0802000f3>] __high_start+0x53/0x55
> >> That's apparently the 2nd cmdline_cook() invocation, when producing
> >> the Dom0 command line. I would suppose what "loader" points to has
> >> been scrubbed by the time we get there (with synchronous scrubbing
> >> APs wouldn't be able to get going with this before reaching
> >> heap_init_late()).
> > 
> > This is via a PVH boot (like a lot of my development work), and does
> > look to be a latent use-after-free.  Dropping the VM down to a single
> > vcpu causes the problem to go away.
> > 
> > Sergey is kindly investigating.
> 
> Yes, this seems to be a bug in Xen PVH boot path. From the serial:
> 
> (XEN) == mbi->mods_addr 0x46dce0
> 
> which is marked as usable in e820:
> 
> (XEN) PVH-e820 RAM map:
> (XEN)  0000000000000000 - 00000000000a0000 (usable)
> (XEN)  0000000000100000 - 0000000040000400 (usable)
> (XEN)  00000000fc000000 - 00000000fc009040 (ACPI data)
> (XEN)  00000000feff8000 - 00000000feffc000 (reserved)
> (XEN)  00000000feffc000 - 00000000feffd000 (usable)
> (XEN)  00000000feffd000 - 00000000ff000000 (reserved)
> 
> This memory is then given to the allocator and scrubbed by secondary
> CPUs which leads to use-after-free. Even with fixing the cmdline issue,
> another FATAL PAGE FAULT occurs further down the boot path:

Right, shouldn't the scrub be started after Dom0 has been constructed?
I would say the scrubbing should be started at the same time as
before, which is just before jumping into Dom0 entry point IIRC?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.