[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Ping: [PATCH v3 0/4] x86/HVM: implement memory read caching

>>> On 02.10.18 at 12:51, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 02/10/18 11:36, Jan Beulich wrote:
>>>>> On 25.09.18 at 16:14, <JBeulich@xxxxxxxx> wrote:
>>> Emulation requiring device model assistance uses a form of instruction
>>> re-execution, assuming that the second (and any further) pass takes
>>> exactly the same path. This is a valid assumption as far as use of CPU
>>> registers goes (as those can't change without any other instruction
>>> executing in between), but is wrong for memory accesses. In particular
>>> it has been observed that Windows might page out buffers underneath
>>> an instruction currently under emulation (hitting between two passes).
>>> If the first pass translated a linear address successfully, any subsequent
>>> pass needs to do so too, yielding the exact same translation.
>>> Introduce a cache (used just by guest page table accesses for now, i.e.
>>> a form of "paging structure cache") to make sure above described
>>> assumption holds. This is a very simplistic implementation for now: Only
>>> exact matches are satisfied (no overlaps or partial reads or anything).
>>> There's also some seemingly unrelated cleanup here which was found
>>> desirable on the way.
>>> 1: x86/mm: add optional cache to GLA->GFN translation
>>> 2: x86/mm: use optional cache in guest_walk_tables()
>>> 3: x86/HVM: implement memory read caching
>>> 4: x86/HVM: prefill cache with PDPTEs when possible
>>> As for v2, I'm omitting "VMX: correct PDPTE load checks" from v3, as I
>>> can't currently find enough time to carry out the requested further
>>> rework.
>> Andrew, George?
> You've not fixed anything from my concerns with v1.

I've responded to your concerns verbally, and you went silent, as is
(I regret having to say so) the case quite often. This simply blocks
any progress. Hence, after enough time went by, I simply took the
liberty to interpret the silence as "the verbal response took care of
my concerns".

> This doesn't behave like real hardware, and definitely doesn't behave as
> named - "struct hvmemul_cache" is simply false.  If it were named
> hvmemul_psc (or some other variation on Paging Structure Cache) then it
> wouldn't be so bad, as the individual levels do make more sense in that
> context

As previously pointed out (without any suggestion coming back from
you), I chose the name "cache" for the lack of a better term. However,
I certainly disagree with naming it PSC or some such, as its level zero
is intentionally there to be eventually used for non-paging-structure

> (not that it would make the behaviour any closer to how hardware
> actually works).

I can certainly appreciate this concern of yours, but the whole issue
the series is aiming to address is something that we can't make
behave like hardware does: Hardware doesn't have the concept of
a device model that it needs to wait for responses from, while trying
to make use of the wait time (i.e. scheduling in another CPU).

Once again (I've said so more than once before) - the use of what
I call cache here is a correctness thing, not a performance
improvement (this, if it so happens, is just a nice side effect).
Nothing like that exists on hardware; I'm merely trying to come
close to paging structure caches. As to them - is it anywhere
spelled out that their data must not come from memory, when
doing a cache fill? We simply have no general purpose cache
that the data could come from.

> I'm also not overly happy with the conditional nature of the caching, or
> that it isn't a transparent read-through cache.  This leads to a large
> amount of boilerplate code for every user.

That's all fine, and I can understand it. Yet I hope you don't mean
to ask that for this correctness fix to become acceptable, you
demand that I implement a general purpose cache sitting
transparently underneath all read/write operations? The more that
it wouldn't even fulfill the purpose, as what the series introduces
specifically also needs to be used for page tables living in MMIO
(i.e. regardless of cachability of the memory accesses involved; I
do realize that this doesn't currently function for other reasons,
but it really should).

As to the amount of boilerplate code: Besides expressing your
dislike, do you have a concrete suggestion as to how you would
envision this to be avoided in a way covering _all_ cases, i.e. in
particular _all_ callers of guest_walk_tables() et al (i.e. all the
functions the first two patches fiddle with)? If I had seen an
obvious and natural way to achieve this, you may rest assured
that I would have tried to avoid introducing the new function
parameters, for which arguments need to be pushed through
the various layers now.

Furthermore I couldn't convince myself that doing this in a
parameter-less way would be a good idea (and in the end
provably correct): Of course we could make caches hang off
of struct vcpu, but then we would need to find a model how
to invalidate them often enough, without invalidating (parts
of) them in ways breaking the correctness that I'm trying to
achieve here.

Bottom line - I think there are three options:
1) You accept this model, despite it not being perfect; of
   course I'm then all ears as to bugs you see in the current
2) You supply a series addressing the correctness issue in a
   way to your liking, within a reasonable time frame (which to
   me would mean in time for 4.12, seeing that this series was
   put together during the 4.11 freeze and posted very soon
   after the tree was re-opened).
3) You provide feedback which is at least constructive enough
   that I can derive from it something that is (a) manageable in
   scope and (b) laying out a way towards addressing the issue
   at hand.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.