|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 02/17] x86emul: support MMX/SSE{, 2, 3} moves
>>> On 01.03.17 at 14:59, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 28/02/17 12:50, Jan Beulich wrote:
>> Previously supported insns are being converted to the new model, and
>> several new ones are being added.
>>
>> To keep the stub handling reasonably simple, integrate SET_SSE_PREFIX()
>> into copy_REX_VEX(), at once switching the stubs to use an empty REX
>> prefix instead of a double DS: one (no byte registers are being
>> accessed, so an empty REX prefix has no effect), except (of course) for
>> the 32-bit test harness build.
>
> Why switch a %ds override to REX? There doesn't appear to be any benefit.
It eliminates a mode_64bit() conditional from the non-VEX path in
the macro. And then, honestly, this is a question I would have
expected (if at all) the first time you came across this. I also think
avoiding two identical prefixes is (marginally) better architecture-
wise.
>> @@ -383,15 +383,35 @@ union vex {
>> };
>> };
>>
>> +#ifdef __x86_64__
>> +# define PFX2 REX_PREFIX
>> +#else
>> +# define PFX2 0x3e
>> +#endif
>> +#define PFX_BYTES 3
>> +#define init_prefixes(stub) ({ \
>> + uint8_t *buf_ = get_stub(stub); \
>> + buf_[0] = 0x3e; \
>> + buf_[1] = PFX2; \
>> + buf_[2] = 0x0f; \
>> + buf_ + 3; \
>> +})
>> +
>> #define copy_REX_VEX(ptr, rex, vex) do { \
>> if ( (vex).opcx != vex_none ) \
>> { \
>> if ( !mode_64bit() ) \
>> vex.reg |= 8; \
>> - ptr[0] = 0xc4, ptr[1] = (vex).raw[0], ptr[2] = (vex).raw[1]; \
>> + (ptr)[0 - PFX_BYTES] = 0xc4; \
>> + (ptr)[1 - PFX_BYTES] = (vex).raw[0]; \
>> + (ptr)[2 - PFX_BYTES] = (vex).raw[1]; \
>> + } \
>> + else \
>> + { \
>> + if ( (vex).pfx ) \
>> + (ptr)[0 - PFX_BYTES] = sse_prefix[(vex).pfx - 1]; \
>> + (ptr)[1 - PFX_BYTES] |= rex; \
>
> This is no longer guarded by mode_64bit(). Won't this result in %ds |
> rex in the 32bit test stubs?
Correct. But please realize that rex is zero at all times when
emulating other than 64-bit mode.
>> @@ -5429,6 +5496,57 @@ x86_emulate(
>> singlestep = _regs._eflags & X86_EFLAGS_TF;
>> break;
>>
>> + CASE_SIMD_PACKED_FP(, 0x0f, 0x50): /* movmskp{s,d} xmm,reg */
>> + CASE_SIMD_PACKED_FP(_VEX, 0x0f, 0x50): /* vmovmskp{s,d} {x,y}mm,reg */
>> + CASE_SIMD_PACKED_INT(0x0f, 0xd7): /* pmovmskb {,x}mm,reg */
>> + case X86EMUL_OPC_VEX_66(0x0f, 0xd7): /* vpmovmskb {x,y}mm,reg */
>> + generate_exception_if(ea.type != OP_REG, EXC_UD);
>> +
>> + if ( vex.opcx == vex_none )
>> + {
>> + if ( vex.pfx & VEX_PREFIX_DOUBLE_MASK )
>> + vcpu_must_have(sse2);
>> + else
>> + {
>> + if ( b != 0x50 )
>> + host_and_vcpu_must_have(mmx);
>> + vcpu_must_have(sse);
>> + }
>> + if ( b == 0x50 || (vex.pfx & VEX_PREFIX_DOUBLE_MASK) )
>> + get_fpu(X86EMUL_FPU_xmm, &fic);
>> + else
>> + get_fpu(X86EMUL_FPU_mmx, &fic);
>> + }
>> + else
>> + {
>> + generate_exception_if(vex.reg != 0xf, EXC_UD);
>
> Isn't this TwoOp?
Yes, hence the #UD. Or is the question "Why is this being done
here, instead of on the common code path?" If so - the common
code path doing this isn't being reached, as we invoke the stub
inside the case block.
>> + if ( b == 0x50 || !vex.l )
>> + host_and_vcpu_must_have(avx);
>> + else
>> + host_and_vcpu_must_have(avx2);
>> + get_fpu(X86EMUL_FPU_ymm, &fic);
>> + }
>> +
>> + opc = init_prefixes(stub);
>> + opc[0] = b;
>> + /* Convert GPR destination to %rAX. */
>> + rex_prefix &= ~REX_R;
>> + vex.r = 1;
>> + if ( !mode_64bit() )
>> + vex.w = 0;
>> + opc[1] = modrm & 0xc7;
>> + fic.insn_bytes = PFX_BYTES + 2;
>> + opc[2] = 0xc3;
>> +
>> + copy_REX_VEX(opc, rex_prefix, vex);
>> + invoke_stub("", "", "=a" (dst.val) : [dummy] "i" (0));
>> +
>> + put_stub(stub);
>> + put_fpu(&fic);
>> +
>> + dst.bytes = 4;
>
> Somewhere there should probably be an ASSERT() that state->simd_size is
> 0, so we don't try to invoke the stub twice.
I can do this, but it didn't seem natural to do so when putting this
together, as - obviously - I did produce/check the table entries at
basically the same time as I did write this code.
>> @@ -6553,23 +6686,14 @@ x86_emulate(
>>
>> if ( state->simd_size )
>> {
>> -#ifdef __XEN__
>> - uint8_t *buf = stub.ptr;
>> -#else
>> - uint8_t *buf = get_stub(stub);
>> -#endif
Note, btw, how the ugly #ifdef-ary goes away here.
>> generate_exception_if(!op_bytes, EXC_UD);
>> generate_exception_if(vex.opcx && (d & TwoOp) && vex.reg != 0xf,
>> EXC_UD);
>>
>> - if ( !buf )
>> + if ( !opc )
>> BUG();
>> - if ( vex.opcx == vex_none )
>> - SET_SSE_PREFIX(buf[0], vex.pfx);
>> -
>> - buf[fic.insn_bytes] = 0xc3;
>> - copy_REX_VEX(buf, rex_prefix, vex);
>> + opc[fic.insn_bytes - PFX_BYTES] = 0xc3;
>
> fic.insn_bytes - PFX_BYTES is in the middle of the opcode, isn't it?
No - note the difference between opc and buf: The former points
past the common prefix bytes.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |