Xen project Mailing List

Re: [Xen-devel] Ping: [PATCH v3 0/4] x86/HVM: implement memory read caching

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Tue, 02 Oct 2018 06:47:05 -0600

Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Paul Durrant <paul.durrant@xxxxxxxxxx>

Delivery-date: Tue, 02 Oct 2018 12:47:13 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

>>> On 02.10.18 at 12:51, <andrew.cooper3@xxxxxxxxxx> wrote: > On 02/10/18 11:36, Jan Beulich wrote: >>>>> On 25.09.18 at 16:14, <JBeulich@xxxxxxxx> wrote: >>> Emulation requiring device model assistance uses a form of instruction >>> re-execution, assuming that the second (and any further) pass takes >>> exactly the same path. This is a valid assumption as far as use of CPU >>> registers goes (as those can't change without any other instruction >>> executing in between), but is wrong for memory accesses. In particular >>> it has been observed that Windows might page out buffers underneath >>> an instruction currently under emulation (hitting between two passes). >>> If the first pass translated a linear address successfully, any subsequent >>> pass needs to do so too, yielding the exact same translation. >>> >>> Introduce a cache (used just by guest page table accesses for now, i.e. >>> a form of "paging structure cache") to make sure above described >>> assumption holds. This is a very simplistic implementation for now: Only >>> exact matches are satisfied (no overlaps or partial reads or anything). >>> >>> There's also some seemingly unrelated cleanup here which was found >>> desirable on the way. >>> >>> 1: x86/mm: add optional cache to GLA->GFN translation >>> 2: x86/mm: use optional cache in guest_walk_tables() >>> 3: x86/HVM: implement memory read caching >>> 4: x86/HVM: prefill cache with PDPTEs when possible >>> >>> As for v2, I'm omitting "VMX: correct PDPTE load checks" from v3, as I >>> can't currently find enough time to carry out the requested further >>> rework. >> Andrew, George? > > You've not fixed anything from my concerns with v1. I've responded to your concerns verbally, and you went silent, as is (I regret having to say so) the case quite often. This simply blocks any progress. Hence, after enough time went by, I simply took the liberty to interpret the silence as "the verbal response took care of my concerns". > This doesn't behave like real hardware, and definitely doesn't behave as > named - "struct hvmemul_cache" is simply false. If it were named > hvmemul_psc (or some other variation on Paging Structure Cache) then it > wouldn't be so bad, as the individual levels do make more sense in that > context As previously pointed out (without any suggestion coming back from you), I chose the name "cache" for the lack of a better term. However, I certainly disagree with naming it PSC or some such, as its level zero is intentionally there to be eventually used for non-paging-structure data. > (not that it would make the behaviour any closer to how hardware > actually works). I can certainly appreciate this concern of yours, but the whole issue the series is aiming to address is something that we can't make behave like hardware does: Hardware doesn't have the concept of a device model that it needs to wait for responses from, while trying to make use of the wait time (i.e. scheduling in another CPU). Once again (I've said so more than once before) - the use of what I call cache here is a correctness thing, not a performance improvement (this, if it so happens, is just a nice side effect). Nothing like that exists on hardware; I'm merely trying to come close to paging structure caches. As to them - is it anywhere spelled out that their data must not come from memory, when doing a cache fill? We simply have no general purpose cache that the data could come from. > I'm also not overly happy with the conditional nature of the caching, or > that it isn't a transparent read-through cache. This leads to a large > amount of boilerplate code for every user. That's all fine, and I can understand it. Yet I hope you don't mean to ask that for this correctness fix to become acceptable, you demand that I implement a general purpose cache sitting transparently underneath all read/write operations? The more that it wouldn't even fulfill the purpose, as what the series introduces specifically also needs to be used for page tables living in MMIO (i.e. regardless of cachability of the memory accesses involved; I do realize that this doesn't currently function for other reasons, but it really should). As to the amount of boilerplate code: Besides expressing your dislike, do you have a concrete suggestion as to how you would envision this to be avoided in a way covering _all_ cases, i.e. in particular _all_ callers of guest_walk_tables() et al (i.e. all the functions the first two patches fiddle with)? If I had seen an obvious and natural way to achieve this, you may rest assured that I would have tried to avoid introducing the new function parameters, for which arguments need to be pushed through the various layers now. Furthermore I couldn't convince myself that doing this in a parameter-less way would be a good idea (and in the end provably correct): Of course we could make caches hang off of struct vcpu, but then we would need to find a model how to invalidate them often enough, without invalidating (parts of) them in ways breaking the correctness that I'm trying to achieve here. Bottom line - I think there are three options: 1) You accept this model, despite it not being perfect; of course I'm then all ears as to bugs you see in the current version. 2) You supply a series addressing the correctness issue in a way to your liking, within a reasonable time frame (which to me would mean in time for 4.12, seeing that this series was put together during the 4.11 freeze and posted very soon after the tree was re-opened). 3) You provide feedback which is at least constructive enough that I can derive from it something that is (a) manageable in scope and (b) laying out a way towards addressing the issue at hand. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.