[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Future support of 5-level paging in Xen





On 12/08/2016 07:20 PM, Andrew Cooper wrote:
On 08/12/2016 23:40, Boris Ostrovsky wrote:



Of course even the largest virtual machine today (2TB on Amazon AFAIK)
is not close to reaching the current memory limit, but it's just a
matter of time.

/me things Oracle will have something to say about this.  I'm sure there
was talk about VMs larger than this at previous hackathons.  XenServer
functions (ish, so long as you don't migrate) with 6TB VMs, although
starting and shutting them down feels like treacle.

I've been working (on and off) with SGI to get one of their 32TB boxes
to boot and I don't think that works. We've fixed a couple of bugs but
I don't think Xen can boot with that much memory. We successfully
booted with just under 8TB but couldn't do it with the full system.
The machine has been taken from us for now so this work is on hold.

This is on OVM, which is 4.4-based, we haven't tried (IIRC) latest bits.

Because 64bit PV guests get 97% of the virtual address space, Xen hits
highmem/lowmem problems at the 5TB boundary, which is where we run out
of virtual address space for the directmap.

Xen supports up to 16TB of RAM (32bits in struct page_info, for a total
of 44 bits of mfns), although last time I checked Xen was still unstable
if there was any RAM above the 5 TB boundary.  Jan did subsequently find
and fix an off-by-one error, and I haven't had occasion to re-test since.

If you enable CONFIG_BIGMEM (newer than 4.4 I think, but I don't


And apparently we don't have that in the OVM version I am looking at. But I'll try the upstream bits when we get a chance to get on this box.


actually recall), Xen's virtual layout changes.  The directmap shrinks
to just 3.5TB, to make space for a frametable containing larger struct
page_info's with 64bit indicies.  This has a total supported limit of
123TB of RAM, due to virtual range allocated to the frametable.

When I observed this going wrong, it went wrong because
alloc_xenheap_page() handed back virtual addresses which creep into the
64bit PV kernels ABI range.  These virtual addresses are safe for Xen to
use in idle and hvm contexts, but not in PV context.

(BTW, speaking of slow starting and shutting down very large guests ---
have you or anyone else had a chance to look at this? My investigation
initially pointed to scrubbing and then to an insane number of
hypercall preemptions in relinquish_memory()).

This is another item I meant to re-engage on.  (Its on my todo list,
along with CPUID and nested virt, but looks like it is depending on my
whishlist item of several extra hours in the day to get some of the work
done in.)

Yes.  We should do something towards fixing that.  Current performance
measurements put a 1.5TB domain at ~14 minutes for the domain_kill
hypercall to complete.

I seem to recall some vague plans towards having per-node dirty-page
lists, scrubbing in idle context, and on-demand scrubbing at alloc-time
if the clean list is empty.


I have this (almost) working but then I found that the hypercall preemption was eating even more time than scrubbing and got distracted by that. And then by other things (I have attention span of a squirrel)

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.