|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 2/4] x86/HVM: implement memory read caching for insn emulation
> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Jan
> Beulich
> Sent: 03 March 2020 10:17
> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Kevin Tian <kevin.tian@xxxxxxxxx>; Wei Liu <wl@xxxxxxx>; Paul Durrant
> <paul@xxxxxxx>; Andrew
> Cooper <andrew.cooper3@xxxxxxxxxx>; Jun Nakajima <jun.nakajima@xxxxxxxxx>;
> Roger Pau Monné
> <roger.pau@xxxxxxxxxx>
> Subject: [EXTERNAL][Xen-devel] [PATCH v5 2/4] x86/HVM: implement memory read
> caching for insn
> emulation
>
> CAUTION: This email originated from outside of the organization. Do not click
> links or open
> attachments unless you can confirm the sender and know the content is safe.
>
>
>
> Emulation requiring device model assistance uses a form of instruction
> re-execution, assuming that the second (and any further) pass takes
> exactly the same path. This is a valid assumption as far as use of CPU
> registers goes (as those can't change without any other instruction
> executing in between [1]), but is wrong for memory accesses. In
> particular it has been observed that Windows might page out buffers
> underneath an instruction currently under emulation (hitting between two
> passes). If the first pass read a memory operand successfully, any
> subsequent pass needs to get to see the exact same value.
>
> Introduce a cache to make sure above described assumption holds. This
> is a very simplistic implementation for now: Only exact matches are
> satisfied (no overlaps or partial reads or anything); this is sufficient
> for the immediate purpose of making re-execution an exact replay. The
> cache also won't be used just yet for guest page walks; that'll be the
> subject of a subsequent change.
>
> With the cache being generally transparent to upper layers, but with it
> having limited capacity yet being required for correctness, certain
> users of hvm_copy_from_guest_*() need to disable caching temporarily,
> without invalidating the cache. Note that the adjustments here to
> hvm_hypercall() and hvm_task_switch() are benign at this point; they'll
> become relevant once we start to be able to emulate respective insns
> through the main emulator (and more changes will then likely be needed
> to nested code).
>
> As to the actual data page in a problamtic scenario, there are a couple
> of aspects to take into consideration:
> - We must be talking about an insn accessing two locations (two memory
> ones, one of which is MMIO, or a memory and an I/O one).
> - If the non I/O / MMIO side is being read, the re-read (if it occurs at
> all) is having its result discarded, by taking the shortcut through
> the first switch()'s STATE_IORESP_READY case in hvmemul_do_io(). Note
> how, among all the re-issue sanity checks there, we avoid comparing
> the actual data.
> - If the non I/O / MMIO side is being written, it is the OSes
> responsibility to avoid actually moving page contents to disk while
> there might still be a write access in flight - this is no different
> in behavior from bare hardware.
> - Read-modify-write accesses are, as always, complicated, and while we
> deal with them better nowadays than we did in the past, we're still
> not quite there to guarantee hardware like behavior in all cases
> anyway. Nothing is getting worse by the changes made here, afaict.
>
> In __hvm_copy() also reduce p's scope and change its type to void *.
>
> [1] Other than on actual hardware, actions like
> XEN_DOMCTL_sethvmcontext, XEN_DOMCTL_setvcpucontext,
> VCPUOP_initialise, INIT, or SIPI issued against the vCPU can occur
> while the vCPU is blocked waiting for a device model to return data.
> In such cases emulation now gets canceled, though, and hence re-
> execution correctness is unaffected.
>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> TBD: In principle the caching here yields unnecessary the one used for
> insn bytes (vio->mmio_insn{,_bytes}. However, to seed the cache
> with the data SVM may have made available, we'd have to also know
> the corresponding GPA. It's not safe, however, to re-walk the page
> tables to find out, as the page tables may have changed in the
> meantime. Therefore I guess we need to keep the duplicate
> functionality for now. A possible solution to this could be to use
> a physical-address-based cache for page table accesses (and looking
> forward also e.g. SVM/VMX insn emulation), and a linear-address-
> based one for all other reads.
> ---
> v5: Re-arrange bitfield. Use domain_crash() in hvmemul_write_cache().
> Move hvmemul_{read,write}_cache() stubs to later patch. Also adjust
> hvmemul_cancel(). Add / extend comments. Re-base.
> v4: Re-write for cache to become transparent to callers.
> v3: Add text about the actual data page to the description.
> v2: Re-base.
>
Generally LGTM, just a couple of points below...
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -28,6 +28,19 @@
> #include <asm/iocap.h>
> #include <asm/vm_event.h>
>
> +struct hvmemul_cache
> +{
> + /* The cache is disabled as long as num_ents > max_ents. */
> + unsigned int num_ents;
> + unsigned int max_ents;
> + struct {
> + paddr_t gpa:PADDR_BITS;
> + unsigned int :BITS_PER_LONG - PADDR_BITS - 8;
Is clang ok with unnamed fields?
> + unsigned int size:8;
> + unsigned long data;
> + } ents[];
> +};
> +
> static void hvmtrace_io_assist(const ioreq_t *p)
> {
> unsigned int size, event;
> @@ -136,6 +149,8 @@ void hvmemul_cancel(struct vcpu *v)
> vio->mmio_access = (struct npfec){};
> vio->mmio_retry = false;
> vio->g2m_ioport = NULL;
> +
> + hvmemul_cache_disable(v);
> }
>
> static int hvmemul_do_io(
> @@ -1883,12 +1898,17 @@ static int hvmemul_rep_movs(
> rc = HVMTRANS_okay;
> }
> else
> + {
> + unsigned int token = hvmemul_cache_disable(curr);
> +
> /*
> * We do a modicum of checking here, just for paranoia's sake and to
> * definitely avoid copying an unitialised buffer into guest address
> * space.
> */
> rc = hvm_copy_from_guest_phys(buf, sgpa, bytes);
> + hvmemul_cache_restore(curr, token);
> + }
>
> if ( rc == HVMTRANS_okay )
> rc = hvm_copy_to_guest_phys(dgpa, buf, bytes, curr);
> @@ -2551,6 +2571,19 @@ static int _hvm_emulate_one(struct hvm_e
> struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io;
> int rc;
>
> + /*
> + * Enable caching if it's currently disabled, but leave the cache
> + * untouched if it's already enabled, for re-execution to consume
> + * entries populated by an earlier pass.
> + */
> + if ( vio->cache->num_ents > vio->cache->max_ents )
> + {
> + ASSERT(vio->io_req.state == STATE_IOREQ_NONE);
> + vio->cache->num_ents = 0;
> + }
> + else
> + ASSERT(vio->io_req.state == STATE_IORESP_READY);
> +
> hvm_emulate_init_per_insn(hvmemul_ctxt, vio->mmio_insn,
> vio->mmio_insn_bytes);
>
> @@ -2564,6 +2597,7 @@ static int _hvm_emulate_one(struct hvm_e
> {
> vio->mmio_cache_count = 0;
> vio->mmio_insn_bytes = 0;
> + hvmemul_cache_disable(curr);
> }
> else
> {
> @@ -2856,6 +2890,123 @@ void hvm_dump_emulation_state(const char
> hvmemul_ctxt->insn_buf);
> }
>
> +int hvmemul_cache_init(struct vcpu *v)
> +{
> + /*
> + * No insn can access more than 16 independent linear addresses (AVX512F
> + * scatters/gathers being the worst). Each such linear range can span a
> + * page boundary, i.e. may require two page walks. Account for each insn
> + * byte individually, for simplicity.
> + */
> + const unsigned int nents = (CONFIG_PAGING_LEVELS + 1) *
> + (MAX_INST_LEN + 16 * 2);
> + struct hvmemul_cache *cache = xmalloc_flex_struct(struct hvmemul_cache,
> + ents, nents);
> +
> + if ( !cache )
> + return -ENOMEM;
> +
> + /* Cache is disabled initially. */
> + cache->num_ents = nents + 1;
> + cache->max_ents = nents;
> +
> + v->arch.hvm.hvm_io.cache = cache;
> +
> + return 0;
> +}
> +
> +unsigned int hvmemul_cache_disable(struct vcpu *v)
> +{
> + struct hvmemul_cache *cache = v->arch.hvm.hvm_io.cache;
> + unsigned int token = cache->num_ents;
> +
> + cache->num_ents = cache->max_ents + 1;
> +
> + return token;
> +}
> +
> +void hvmemul_cache_restore(struct vcpu *v, unsigned int token)
> +{
> + struct hvmemul_cache *cache = v->arch.hvm.hvm_io.cache;
> +
> + ASSERT(cache->num_ents > cache->max_ents);
> + cache->num_ents = token;
ASSERT(token <= cache->max_ents) here?
Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |