Re: [Xen-devel] [PATCH] turn off writable page tables
Keir Fraser wrote:
On 31 Jul 2006, at 10:32, Zachary Amsden wrote:
It would allow set_pte() to switch between explicit queuing and
'direct' writing. We moved away from the former a few years back as
doing it everywhere made a mess of the generic Linux mm code and it
was hard to reason whether our patches were correct. I guess doing
it for the most important subset of mm routines is not so bad. It's
a shame that, although many set_pte() call sites could determine
statically whether or not they will batch, we'd end up with a
dynamic run-time test everywhere (unless I'm mistaken) -- I wonder
if that has a measurable cost?
We've actually seen a benefit for this, despite the cost of the
non-static parameters, for both VMI Linux with shadow pagetables on
ESX and VMI Linux with direct pagetables on Xen. Turns out that as
long as the call EIP is predictable, the parameters do not
necessarily need to be so, and modern processors are getting much
better at branch prediction.
You mean that the benefit of batching outweighs the cost of an extra
test-and-branch in the middle of a loop, or that the extra
test-and-branch simply has unmeasurable overhead? The former is to be
expected, but I'd be worried about other call sites where batching
does not happen, and an effect on those.
The extra test-and-branch has unmeasurable overhead. In the
implementation we had chosen, there was already a branch requirement on
the set_pte call anyway, to potentially delay the pte update so that it
can piggyback onto a page invalidation with just one hypercall.
Combining the two branches into one is trivial, and the cost of one
extra branch here seems to be invisible. We were getting better numbers
for MMU related workloads with VMI-Linux than XenoLinux was. I don't
have hard numbers on this, and even if I did, it would take some time to
get them approved for public distribution. For that I must apologize.
But avoiding the changes that would otherwise be required - a full set
of pte and tlb functions which could be delayed, as well as combining
the pte update and invlpg into a single call - seemed worth a single
branch. I'm not even convinced these changes can be done in a way that
would be safe for all architectures. Of course, I may be wrong on that
point - but there is no simple way I see to do it that affords the
strong reasoning about correctness that the enter / leave semantic does.
Doing explicit batching exactly where it counts, under protection of
locks, so that SMP safety is guaranteed turns out to be really easy,
as well as a nice win.
If the run-time check cost really isn't an issue (I'd like to see
numbers), we'd likely use this new interface in preference to
implicitly batched writable pagetables and would support its inclusion
in the kernel.
Sorry about not having numbers. My biggest question is - do you need
any other information than simply a single state variable to use
explicit batching? I thought, and Jeremy and Chris both pointed out as
well, that Xen could potentially use the information about which PT to
unhook to take advantage of writable pagetables. But, if that is not
the direction you are going, then it seems this information is not so
relevant for the explicit batching. The explicit batching does have one
disadvantage without writable page tables, which is a potential long
term maintenance / correctness issue - you must remove read hazards from
these encapsulated paths. That is not so hard to do, and not a large
general problem, because the batching is explicit rather than implicit,
so you can pick paths to batch that are small, compact, and easy to
reason about. But nevertheless, a point I would like to make sure you
are comfortable with before we all decide these hooks will work for
Xen-devel mailing list