[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/x86: domain: Free all the pages associated to struct domain



Hi David,

On 22/01/2020 13:50, Woodhouse, David wrote:
On Wed, 2020-01-22 at 13:15 +0000, Andrew Cooper wrote:
I'd much rather see the original patch reverted.  The current size of
struct domain with lockprofile enabled is 3200 bytes.

Let me have a look first to see when/why struct domain is less than 4K
with lockprofile.

In the intervening time, Juergen has totally rewritten lock profiling,
and the virtual timer infrastructure has been taken out-of-line.

Its probably the latter which is the dominating factor.

OK, so if we revert 8916fcf4577 is it reasonable for me to then assume
that 'struct domain' will always fit within a page, and declare that
live update shall not work to a Xen where that isn't true?
While it is nice to have struct domain small, I would rather not bake to size assumption in live update.

The more on Arm we have been quite often close to the limit. So I don't want to limit my choices.


I have a nasty trick in mind...

During live update, we need to do a pass over the live update data in
early boot in order to work out which pages to reserve. That part has
to be done early, while the boot allocator is active. It works by
setting the PGC_allocated bit in the page_info of the reserved pages,
so that init_heap_pages() knows not to include them. I've posted that
part already.

Also during live update, we need to consume the actual domain state
that was passed over from the previous Xen, and fill in the owner (and
refcount etc.) in the same page_info structures, before those pages are
in a truly consistent state.

Right now, we need the latter to happen *after* the boot allocator is
done and we're able to allocate from the heap... because we need to be
able to allocate the domain structures, and we don't want to ensure
that there's enough space in the LU reserved bootmem for that many
domain structures.

As you pointed out above, the struct domain itself is a page (i.e 4KB on x86). Let say you have 256 domains, this would only use 1MB. Which is not too bad to take into account.

But struct domain has a lot of out-of-line allocation. If you are not planning to make struct domain part of the ABI, then you would need quite a few allocations even in your cases.

Of course, you could just half initialize struct domain. However, I would be cautious with this solution as it would be more difficult to chase down bug around struct page. Imagine the domain pointer is accessed earlier than expected.

Hence the nasty trick: What if we allocate the new struct domain on
*top* of the old one, since we happen to know that page *wasn't* used
by the previous Xen for anything that needs to be preserved. The
structure itself isn't an ABI and never can be, and it will have to be
repopulated from the live migration data, of course — but if we can at
least make the assumption that it'll *fit*, then perhaps we can manage
to do both of the above with only one pass over all the domain pages.

This will have a direct effect on the customer-observed pause time for
a live update. So it's kind of icky, but also *very* tempting...

May I recommend to get some numbers with the multiple pass and "nicer" code first? We can decide on that what sort of hackery we need to lower the pause time.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.