[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 0/3] Live update boot memory management



On Tue, 2020-01-14 at 15:00 +0000, Julien Grall wrote:
> 
> On 14/01/2020 14:48, David Woodhouse wrote:
> > On Tue, 2020-01-14 at 14:15 +0000, Julien Grall wrote:
> > > Hi David,
> > > 
> > > On 13/01/2020 11:54, David Woodhouse wrote:
> > > > On Wed, 2020-01-08 at 17:24 +0000, David Woodhouse wrote:
> > > > > So we've settled on a simpler approach — reserve a contiguous region
> > > > > of physical memory which *won't* be used for domain pages. Let the 
> > > > > boot
> > > > > allocator see *only* that region of memory, and plug the rest of the
> > > > > memory in later only after doing a full pass of the live update state.
> > > 
> > > It is a bit unclear what the region will be used for. If you plan to put
> > > the state of the VMs in it, then you can't possibly use it for boot
> > > allocation (e.g frametable) otherwise this may be overwritten when doing
> > > the live update.
> > 
> > Right. This is only for boot time allocations by Xen#2, before it's
> > processed the LU data and knows which parts of the rest of memory it
> > can use. It allocates its frame table from there, as well as anything
> > else it needs to allocate before/while processing the LU data.
> 
> It would be worth documenting what is the expectation of the buffer. 
> Maybe in xen-command-line along with the rest of the new option you 
> introduced? Or in a separate document.

Kind of need to implement that part too, and then we can document what
it finally looks like :)

> > As an implementation detail, I anticipate that we'll be using the boot
> > allocator for that early part from the reserved region, and that the
> > switch to using the full available memory (less those pages already in-
> > use) will *coincide* with switching to the real heap allocator.
> > 
> > The reserved region *isn't* for the LU data itself. That can be
> > allocated from arbitrary pages *outside* the reserved area, in Xen#1.
> > Xen#2 can vmap those pages, and needs to avoid stomping on them just
> > like it needs to avoid stomping on actual domain-owned pages.
> > 
> > The plan is that Xen#1 allocates arbitrary pages to store the actual LU
> > data. Then another page (or higher order allocation if we need >2MiB of
> > actual LU data) containing the MFNs of all those data pages. Then we
> > need to somehow pass the address of that MFN-list to Xen#2.
> > 
> > My current plan is to put *that* in the first 64 bits of the reserved
> > LU bootmem region, and load it from there early in the Xen#2 boot
> > process. I'm looking at adding an IND_WRITE64 primitive to the kimage
> > processing, to allow it to be trivially appended for kexec_reloc() to
> > obey.
> 
> Wouldn't it be better to reserve the first 4K page of the LU bootmem region?
> 
> Otherwise, you may end up into the same trouble as described above (to a 
> lesser extent) if the 64-bit value overwrite anything useful for the 
> current Xen. But I guess, you could delay the writing just before you 
> jump to xen#2.

That's the point in appending an IND_WRITE64 operation to the kimage
stream. The actual write is done in the last gasp of kexec_reloc()
after Xen#1 is quiescent, on the way into purgatory.

So when Xen#1 has created the LU data stream, (for which the pointer to
the root of that data structure is page-aligned) it just calls
  kimage_add_entry(image, IND_WRITE64 | lu_data_address);

--- a/xen/arch/x86/x86_64/kexec_reloc.S
+++ b/xen/arch/x86/x86_64/kexec_reloc.S
@@ -131,11 +131,18 @@ is_source:
         jmp     next_entry
 is_zero:
         testb   $IND_ZERO, %cl
-        jz      next_entry
+        jz      is_write64
         movl    $(PAGE_SIZE / 8), %ecx  /* Zero the destination page. */
         xorl    %eax, %eax
         rep stosq
         jmp     next_entry
+is_write64:
+        testb   $IND_WRITE64, %cl
+        jz      next_entry
+        andq    $PAGE_MASK, %rcx
+        movl    %rcx, %rax
+        stosq
+        jmp     next_entry
 done:
         popq    %rbx

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.