[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 06/44] x86emul: test for correct EVEX Disp8 scaling



>>> On 13.11.18 at 16:45, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 13/11/2018 11:12, Jan Beulich wrote:
>>>>> On 12.11.18 at 18:42, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 25/09/18 14:29, Jan Beulich wrote:
>>>> +enum vl {
>>>> +    VL_128,
>>>> +    VL_256,
>>>> +    VL_512,
>>>> +};
>>>> +
>>>> +enum scale {
>>>> +    SC_vl,
>>>> +    SC_el,
>>>> +};
>>>> +
>>>> +enum vsz {
>>>> +    VSZ_vl,
>>>> +    VSZ_vl_2, /* VL / 2 */
>>>> +    VSZ_vl_4, /* VL / 4 */
>>>> +    VSZ_vl_8, /* VL / 8 */
>>>> +    /* "no broadcast" implied from here on. */
>>>> +    VSZ_el,
>>>> +    VSZ_el_2, /* EL * 2 */
>>>> +    VSZ_el_4, /* EL * 4 */
>>>> +    VSZ_el_8, /* EL * 8 */
>>>> +};
>>>> +
>>> These acronyms get increasingly difficult to follow.  What is el in this
>>> context?
>> VL -> vector length
>> EL -> element length
> 
> Can you at least leave trailing comments after the identifiers for the
> benefit of people other than you reading the code?

Will do.

>>>> +static const struct test avx512f_all[] = {
>>>> +    INSN_SFP(mov,            0f, 10),
>>>> +    INSN_SFP(mov,            0f, 11),
>>>> +    INSN_PFP_NB(mova,        0f, 28),
>>>> +    INSN_PFP_NB(mova,        0f, 29),
>>>> +    INSN(movdqa32,     66,   0f, 6f,    vl,   d_nb, vl),
>>>> +    INSN(movdqa32,     66,   0f, 7f,    vl,   d_nb, vl),
>>>> +    INSN(movdqa64,     66,   0f, 6f,    vl,   q_nb, vl),
>>>> +    INSN(movdqa64,     66,   0f, 7f,    vl,   q_nb, vl),
>>>> +    INSN(movdqu32,     f3,   0f, 6f,    vl,   d_nb, vl),
>>>> +    INSN(movdqu32,     f3,   0f, 7f,    vl,   d_nb, vl),
>>>> +    INSN(movdqu64,     f3,   0f, 6f,    vl,   q_nb, vl),
>>>> +    INSN(movdqu64,     f3,   0f, 7f,    vl,   q_nb, vl),
>>>> +    INSN(movntdq,      66,   0f, e7,    vl,   d_nb, vl),
>>>> +    INSN(movntdqa,     66, 0f38, 2a,    vl,   d_nb, vl),
>>>> +    INSN_PFP_NB(movnt,       0f, 2b),
>>>> +    INSN_PFP_NB(movu,        0f, 10),
>>>> +    INSN_PFP_NB(movu,        0f, 11),
>>>> +};
>>>> +
>>>> +static const struct test avx512f_128[] = {
>>>> +    INSN(mov,       66,   0f, 6e, el, dq64, el),
>>>> +    INSN(mov,       66,   0f, 7e, el, dq64, el),
>>>> +    INSN(movq,      f3,   0f, 7e, el,    q, el),
>>>> +    INSN(movq,      66,   0f, d6, el,    q, el),
>>>> +};
>>>> +
>>>> +static const struct test avx512bw_all[] = {
>>>> +    INSN(movdqu8,     f2,   0f, 6f,    vl,    b, vl),
>>>> +    INSN(movdqu8,     f2,   0f, 7f,    vl,    b, vl),
>>>> +    INSN(movdqu16,    f2,   0f, 6f,    vl,    w, vl),
>>>> +    INSN(movdqu16,    f2,   0f, 7f,    vl,    w, vl),
>>>> +};
>>>> +
>>>> +static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 };
>>>> +static const unsigned char vl_128[] = { VL_128 };
>>> What are these for, and why is vl_all[]'s VL_128 out of order?
>> The RUN() macro invocations (further down) reference one them
>> each, to indicate what vector lengths to test. The first array
>> entry does always get used, while subsequent entries (if any)
>> require AVX512VL to be available. See the conditional at the top
>> of the inner loop in test_group().
> 
> After re-reading the apparently relevant bits of Vol 1, 2 and 3, I'm
> still actually none the wiser as to which AVX512 feature bits mean what.

What feature bits are you talking about? The context above doesn't
refer to any, at least not directly.

> Is there a chapter with an overview that I've overlooked, or if not, can
> we see about putting one together?

I'd have to understand (as per above) what exactly you're after in
order to answer this and/or put something together. What I had
done in the year that I had been waiting for a machine to be able
to test all of this was to put together two classifying tables. One is
what I'm slowly migrating into the tables right here. The other is
taking a slightly different view, grouping them by memory access
width, Disp8 scaling behavior, and "tuple type":

MNEMONIC                Disp8 Scale             Mem Width               Category
======================================================================================
vpcompressb             0                       VL                      Tuple1 
Scalar
vpexpandb
--------------------------------------------------------------------------------------
vpbroadcastb            0                       Disp8 Scale             Tuple1 
Scalar
vpextrb
vpinsrb
======================================================================================
vpcompressw             1                       VL                      Tuple1 
Scalar
vpexpandw
--------------------------------------------------------------------------------------
vpbroadcastw            1                       Disp8 Scale             Tuple1 
Scalar
vpextrw
vpinsrw
======================================================================================
vcompressps             2                       VL                      Tuple1 
Scalar
vexpandps
vpcompressd
vpexpandd

etc.

But I didn't compile anything matching up with feature flags, other
than the first having tables grouped by features. I did a lot of
binutils work though during that time as well, to get things into a
more sane state there. An original thought of mine was whether
we could re-use their opcode table, which is why I first wanted
to get it into better shape.

(A 3rd table then turned out helpful for implementing the myriad of
conversion insns, but that's just a re-ordered derivation of the first
one here, going by encoding instead.)

>>>> +
>>>> +/*
>>>> + * This table, indicating the presence of an immediate (byte) for an 
>>>> opcode
>>>> + * space 0f major opcode, is indexed by high major opcode byte nibble, 
>>>> with
>>>> + * each table element then bit-indexed by low major opcode byte nibble.
>>>> + */
>>>> +static const uint16_t imm0f[16] = {
>>>> +    [0x7] = (1 << 0x0) /* vpshuf* */ |
>>>> +            (1 << 0x1) /* vps{ll,ra,rl}w */ |
>>>> +            (1 << 0x2) /* vps{l,r}ld, vp{rol,ror,sra}{d,q} */ |
>>>> +            (1 << 0x3) /* vps{l,r}l{,d}q */,
>>>> +    [0xc] = (1 << 0x2) /* vcmp{p,s}{d,s} */ |
>>>> +            (1 << 0x4) /* vpinsrw */ |
>>>> +            (1 << 0x5) /* vpextrw */ |
>>>> +            (1 << 0x6) /* vshufp{d,s} */,
>>>> +};
>>>> +
>>>> +static struct x86_emulate_ops emulops;
>>>> +
>>>> +static unsigned int accessed[3 * 64];
>>> What are the expected properties?  Why 3 * ?
>> See record_access(): The instructions under test all get a Disp8 value
>> of 1 encoded. In order to be able to sensibly see how exactly things
>> go wrong (during debugging), it simply helps to cover the entire range
>> from zero to 3 times the (maximum) vector length. All accesses farther
>> out of bounds than by vector length will not be recorded here, and
>> hence fail "silently".
> 
> Please can you put a short description in a comment somewhere around
> about here.

Again, will do.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.