Hi Matt,
Matt Chapman <mailto:matthewc@xxxxxxxxxxxxxxx> wrote on Saturday, April 30,
2005 7:21 PM:
> (I'm coming in late here, so apologies if I'm missing something.)
>
>>> No, multiple page sizes are supported, though there does have
>>> to be a system-wide minimum page size (e.g. if this were defined
>>> as 16KB, a 4KB-page mapping request from a guestOS would be rejected).
>
> Of course if necessary smaller page sizes could be supported in software
> at a performance cost, as suggested in the ASDM (2.II.5.5 Subpaging).
>
>> In my opinion this is a moot point because in order to provide the
>> appropriate semantics for physical mode emulation (PRS.dt, or PSR.it, or
>> PSR.rt == 0) it is necessary to support a 4K page size as the minimum
>> (unless you special case translations for physical mode emulation).
>
> Can you explain why this is the case? Surely the granularity of the
> metaphysical->physical mapping can be arbitrary?
The issue is that the architecture defines memory attributes for pages
accessed in physical mode (PSR.dt = 0 and/or PSR.it = 0, and/or PSR.rt == 0)
in 4K chunks (the attributes only apply for 4KB of memory at a time). If you
use larger (than 4KB) pages to emulate physical mode, then the attributes
will apply to more than 4KB.
This is mostly an architectural purity thing. I would have a hard time
arguing with Dan that this could not be avoided as an issue for
paravirtualized guests :-)
>
>> Also in
>> terms of machine memory utilization, it is better to have smaller pages
(I
>> know this functionality is not yet available in Xen, but I believe it
will
>> become important once people are done working on the basics).
>
> Below you say "Memory footprint is really not that big a deal for these
> large machines" ;) As it is, just about everyone runs Itanium Linux
> with 16KB page size, so 16KB memory granularity is obviously not a big
> deal.
> Since the mappings inserted by the hypervisor are limited to this
> granularity (at least, without some complicated superpage logic to
> allocate and map pages sequentially), I like the idea of using a larger
> granularity in order to increase TLB coverage.
Yes, this is true. It is just that the larger the granularity, the harder it
is to move pages around. Also with smaller granularity a VMM can be more
efficient (stingy) with memory allocations. For example if a VM defines
pages in 16KB chunks, and the VMM is entirely demand driven (it only
allocates a page to a VM when the VM accesses that page), then it is
possible to keep smaller working sets (when the VM touches a page the VMM
only allocates 1/4 of that page - 4KB out of 16 KB), allowing the VMM to
support more aggressive overcommittment of memory. Of course, if the VM
touches all the 4 sections in the 16KB page then going for TLB utilization
is the better tradeoff. As usual, this is very workload specific.
>
>>> Purging is definitely expensive but there may be ways to
>>> minimize that. That's where the research comes in.
>>
>> It is not just purging. Having a global VHPT is, in general, really bad
for
>> scalability. Every time the hypervisor wants to modify anything in the
VHPT,
>> it must guarantee that no other processors are accessing that VHPT (this
is
>> a fairly complex thing to do in TLB miss handlers).
>
> I think there are more than two options here? From what I gather, I
> understand that you are comparing a single global lVHPT to a per-domain
> lVHPT. There is also the option of a per-physical-CPU lVHPT, and a
> per-domain per-virtual-CPU lVHPT.
Yes, there are multiple options.
> When implementing the lVHPT in Linux I decided on a per-CPU VHPT for the
> scalability reasons that you cite. And one drawback is, as Dan says,
> that it may be difficult to find a large enough chunk of free physical
> memory to bring up a new processor (or domain in the per-domain case).
The VHPT does not need to be contiguous in physical memory, although I think
most OS implementations I know of assume it is (in order to map it using a
TR).
>
>> Another important thing is hashing into the VHPT. If you have a single
VHPT
>> for multiple guests (and those guests are the same, e.g., same version of
>> Linux) then you are depending 100% on having a good RID allocator (per
>> domain) otherwise the translations for different domains will start
>> colliding in your hash chains and thus reducing the efficiency of your
VHPT.
>> The point here is that guest OSs (that care about this type of stuff) are
>> designed to spread RIDs such that they minimize their own hash chain
>> collisions, but there are not design to not collide with other guest's.
>> Also, the fact that the hash algorithm is implementation specific makes
this
>> problem even worse.
>
> RID allocation is certainly an issue, but I think it's an issue even
> with a per-domain VHPT. If you have a guest that uses the short VHPT,
> such as Linux by default, it may not produce good RID allocation even
> with just one domain. For best performance one would need to either
> modify the guest, or virtualise RIDs completely, in which case a global
> or per-physical-CPU VHPT can be made to work well too.
The point is that IPF OS designers know that allocating RIDs in the wrong
way may cause more VHPT hash chain collisions (I am talking about a long
format VHPT with collision chains), and as such they work to avoid the
problem, but they don't work to avoid the problem of their RIDs colliding
with the RIDs of other OS sharing the same VHPT, so in this case the RID
allocation issue becomes entirely a VMM problem. This is a big deal if the
RIDs are ot virtualized.
You are correct to state that virtualizing RIDs is a way to deal with this
problem. But I still think that the problem is a bit more difficult for a
global VHPT because the VMM has to worry about two things:
1- That the allocated RIDs do not collide (cause collisions in the VHPT hash
chains) with RIDs allocated for that VM
And
2- That the allocated RIDs do not collide with RIDs allocated to other VMs
With per domain VHPTs a VMM only has to worry about 1 above.
>
> Matt
Bert
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel
|