[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 07/25] x86emul: support AVX2 gather insns



>>> On 01.02.18 at 21:53, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 07/12/17 14:03, Jan Beulich wrote:
>> @@ -2805,13 +2808,17 @@ x86_decode(
>>              ea.type = OP_MEM;
>>              if ( modrm_rm == 4 )
>>              {
>> -                sib = insn_fetch_type(uint8_t);
>> -                sib_index = ((sib >> 3) & 7) | ((rex_prefix << 2) & 8);
>> -                sib_base  = (sib & 7) | ((rex_prefix << 3) & 8);
>> -                if ( sib_index != 4 && !(d & vSIB) )
>> -                    ea.mem.off = *decode_register(sib_index, state->regs,
>> -                                                  false);
>> -                ea.mem.off <<= (sib >> 6) & 3;
>> +                uint8_t sib = insn_fetch_type(uint8_t);
>> +                uint8_t sib_base = (sib & 7) | ((rex_prefix << 3) & 8);
>> +
>> +                state->sib_index = ((sib >> 3) & 7) | ((rex_prefix << 2) & 
>> 8);
>> +                state->sib_scale = (sib >> 6) & 3;
>> +                if ( state->sib_index != 4 && !(d & vSIB) )
>> +                {
>> +                    ea.mem.off = *decode_register(state->sib_index,
>> +                                                  state->regs, false);
>> +                    ea.mem.off <<= state->sib_scale;
> 
> This is a functional change.

In what way? The code is just being re-organized, so that the two
pieces of information needed later go into the new state fields. Are
you perhaps referring to the shift previously having happened
outside the if()? With the condition being false, that was simply a
shifting zero left (or else it would have been wrong to sit outside
the if()).

>> @@ -7472,6 +7479,110 @@ x86_emulate(
>>          break;
>>      }
>>  
>> +    case X86EMUL_OPC_VEX_66(0x0f38, 0x90): /* vpgatherd{d,q} 
>> {x,y}mm,mem,{x,y}mm */
>> +    case X86EMUL_OPC_VEX_66(0x0f38, 0x91): /* vpgatherq{d,q} 
>> {x,y}mm,mem,{x,y}mm */
>> +    case X86EMUL_OPC_VEX_66(0x0f38, 0x92): /* vgatherdp{s,d} 
>> {x,y}mm,mem,{x,y}mm */
>> +    case X86EMUL_OPC_VEX_66(0x0f38, 0x93): /* vgatherqp{s,d} 
>> {x,y}mm,mem,{x,y}mm */
>> +    {
>> +        unsigned int mask_reg = ~vex.reg & (mode_64bit() ? 0xf : 7);
>> +        typeof(vex) *pvex;
>> +        union {
>> +            int32_t dw[8];
>> +            int64_t qw[4];
>> +        } index, mask;
>> +
>> +        ASSERT(ea.type == OP_MEM);
>> +        generate_exception_if(modrm_reg == state->sib_index ||
>> +                              modrm_reg == mask_reg ||
>> +                              state->sib_index == mask_reg, EXC_UD);
>> +        generate_exception_if(!cpu_has_avx, EXC_UD);
>> +        vcpu_must_have(avx2);
>> +        get_fpu(X86EMUL_FPU_ymm, &fic);
>> +
>> +        /* Read destination, index, and mask registers. */
>> +        opc = init_prefixes(stub);
>> +        pvex = copy_VEX(opc, vex);
>> +        pvex->opcx = vex_0f;
>> +        opc[0] = 0x7f; /* vmovdqa */
>> +        /* Use (%rax) as destination and modrm_reg as source. */
>> +        pvex->r = !mode_64bit() || !(modrm_reg & 8);
>> +        pvex->b = 1;
>> +        opc[1] = (modrm_reg & 7) << 3;
>> +        pvex->reg = 0xf;
>> +        opc[2] = 0xc3;
>> +
>> +        invoke_stub("", "", "=m" (*mmvalp) : "a" (mmvalp));
>> +
>> +        pvex->pfx = vex_f3; /* vmovdqu */
>> +        /* Switch to sib_index as source. */
>> +        pvex->r = !mode_64bit() || !(state->sib_index & 8);
>> +        opc[1] = (state->sib_index & 7) << 3;
>> +
>> +        invoke_stub("", "", "=m" (index) : "a" (&index));
>> +
>> +        /* Switch to mask_reg as source. */
>> +        pvex->r = !mode_64bit() || !(mask_reg & 8);
>> +        opc[1] = (mask_reg & 7) << 3;
>> +
>> +        invoke_stub("", "", "=m" (mask) : "a" (&mask));
>> +        put_stub(stub);
>> +
>> +        /* Clear untouched parts of the destination and mask values. */
>> +        n = 1 << (2 + vex.l - ((b & 1) | vex.w));
>> +        op_bytes = 4 << vex.w;
>> +        memset((void *)mmvalp + n * op_bytes, 0, 32 - n * op_bytes);
>> +        memset((void *)&mask + n * op_bytes, 0, 32 - n * op_bytes);
>> +
>> +        for ( i = 0; i < n && rc == X86EMUL_OKAY; ++i )
>> +        {
>> +            if ( (vex.w ? mask.qw[i] : mask.dw[i]) < 0 )
>> +            {
>> +                signed long idx = b & 1 ? index.qw[i] : index.dw[i];
>> +
>> +                rc = ops->read(ea.mem.seg,
>> +                               ea.mem.off + (idx << state->sib_scale),
>> +                               (void *)mmvalp + i * op_bytes, op_bytes, 
>> ctxt);
>> +                if ( rc != X86EMUL_OKAY )
>> +                    break;
>> +
>> +#ifdef __XEN__
>> +                if ( i + 1 < n && local_events_need_delivery() )
>> +                    rc = X86EMUL_RETRY;
>> +#endif
>> +            }
>> +
>> +            if ( vex.w )
>> +                mask.qw[i] = 0;
>> +            else
>> +                mask.dw[i] = 0;
>> +        }
> 
> The incomplete case here is rather more complicated.  In the case that
> rc != OK and local events are pending, RF needs setting, although it is
> not clear if this is only applicable if an exception is pending, or
> between every element.

Isn't this a more general issue, e.g. also applicable to repeated
string insns? Right now we only ever clear RF. I'm not convinced
dealing with this belongs here.

>> --- a/xen/arch/x86/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate.c
>> @@ -10,6 +10,7 @@
>>   */
>>  
>>  #include <xen/domain_page.h>
>> +#include <xen/event.h>
> 
> Spurious hunk?

No - this is for local_events_need_delivery() (still visible above).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.