[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 06/44] x86emul: test for correct EVEX Disp8 scaling



>>> On 12.11.18 at 18:42, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 25/09/18 14:29, Jan Beulich wrote:
>> Besides the already existing tests (which are going to be extended once
>> respective ISA extension support is complete), let's also ensure for
>> every individual insn that their Disp8 scaling (and memory access width)
>> are correct.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> 
> I can see what you're attempting to do, but you now have two
> implementations of the EVEX disp8 logic written by yourself.  AFAICT,
> this doesn't actually check that the behaviour of the instruction in
> hardware matches your model of the instruction - it checks that two of
> your models are the same.

Correct, but I've specifically tried to make the two models sufficiently
different.

> The only way I can think of testing the emulator model against hardware
> is to start with two memory area poisoned with a non-repeating pattern,
> and a src/dst register poisoned with a different non-repeating pattern. 
> Then, execute a real instruction stub, emulate the other and memcmp()
> the two memory regions.

That's what some of the tests added right in patch 5 do. Did you
intentionally skip that patch while reviewing?

> That way, a systematic error in the two models won't cancel out to "all ok".

Hence the two different models. I certainly realize the risk you
name.

>> --- /dev/null
>> +++ b/tools/tests/x86_emulator/evex-disp8.c
>> @@ -0,0 +1,452 @@
>> +#include <stdarg.h>
>> +#include <stdio.h>
>> +
>> +#include "x86-emulate.h"
> 
> This now needs rearranging to avoid:
> 
> x86-emulate.h:30:3: error: #error "Must not include <stdio.h> before
> x86-emulate.h"
>  # error "Must not include <stdio.h> before x86-emulate.h"

Yes, I've already re-based over that other change.

>> +enum vl {
>> +    VL_128,
>> +    VL_256,
>> +    VL_512,
>> +};
>> +
>> +enum scale {
>> +    SC_vl,
>> +    SC_el,
>> +};
>> +
>> +enum vsz {
>> +    VSZ_vl,
>> +    VSZ_vl_2, /* VL / 2 */
>> +    VSZ_vl_4, /* VL / 4 */
>> +    VSZ_vl_8, /* VL / 8 */
>> +    /* "no broadcast" implied from here on. */
>> +    VSZ_el,
>> +    VSZ_el_2, /* EL * 2 */
>> +    VSZ_el_4, /* EL * 4 */
>> +    VSZ_el_8, /* EL * 8 */
>> +};
>> +
> 
> These acronyms get increasingly difficult to follow.  What is el in this
> context?

VL -> vector length
EL -> element length

>> +static const struct test avx512f_all[] = {
>> +    INSN_SFP(mov,            0f, 10),
>> +    INSN_SFP(mov,            0f, 11),
>> +    INSN_PFP_NB(mova,        0f, 28),
>> +    INSN_PFP_NB(mova,        0f, 29),
>> +    INSN(movdqa32,     66,   0f, 6f,    vl,   d_nb, vl),
>> +    INSN(movdqa32,     66,   0f, 7f,    vl,   d_nb, vl),
>> +    INSN(movdqa64,     66,   0f, 6f,    vl,   q_nb, vl),
>> +    INSN(movdqa64,     66,   0f, 7f,    vl,   q_nb, vl),
>> +    INSN(movdqu32,     f3,   0f, 6f,    vl,   d_nb, vl),
>> +    INSN(movdqu32,     f3,   0f, 7f,    vl,   d_nb, vl),
>> +    INSN(movdqu64,     f3,   0f, 6f,    vl,   q_nb, vl),
>> +    INSN(movdqu64,     f3,   0f, 7f,    vl,   q_nb, vl),
>> +    INSN(movntdq,      66,   0f, e7,    vl,   d_nb, vl),
>> +    INSN(movntdqa,     66, 0f38, 2a,    vl,   d_nb, vl),
>> +    INSN_PFP_NB(movnt,       0f, 2b),
>> +    INSN_PFP_NB(movu,        0f, 10),
>> +    INSN_PFP_NB(movu,        0f, 11),
>> +};
>> +
>> +static const struct test avx512f_128[] = {
>> +    INSN(mov,       66,   0f, 6e, el, dq64, el),
>> +    INSN(mov,       66,   0f, 7e, el, dq64, el),
>> +    INSN(movq,      f3,   0f, 7e, el,    q, el),
>> +    INSN(movq,      66,   0f, d6, el,    q, el),
>> +};
>> +
>> +static const struct test avx512bw_all[] = {
>> +    INSN(movdqu8,     f2,   0f, 6f,    vl,    b, vl),
>> +    INSN(movdqu8,     f2,   0f, 7f,    vl,    b, vl),
>> +    INSN(movdqu16,    f2,   0f, 6f,    vl,    w, vl),
>> +    INSN(movdqu16,    f2,   0f, 7f,    vl,    w, vl),
>> +};
>> +
>> +static const unsigned char vl_all[] = { VL_512, VL_128, VL_256 };
>> +static const unsigned char vl_128[] = { VL_128 };
> 
> What are these for, and why is vl_all[]'s VL_128 out of order?

The RUN() macro invocations (further down) reference one them
each, to indicate what vector lengths to test. The first array
entry does always get used, while subsequent entries (if any)
require AVX512VL to be available. See the conditional at the top
of the inner loop in test_group().

>> +
>> +/*
>> + * This table, indicating the presence of an immediate (byte) for an opcode
>> + * space 0f major opcode, is indexed by high major opcode byte nibble, with
>> + * each table element then bit-indexed by low major opcode byte nibble.
>> + */
>> +static const uint16_t imm0f[16] = {
>> +    [0x7] = (1 << 0x0) /* vpshuf* */ |
>> +            (1 << 0x1) /* vps{ll,ra,rl}w */ |
>> +            (1 << 0x2) /* vps{l,r}ld, vp{rol,ror,sra}{d,q} */ |
>> +            (1 << 0x3) /* vps{l,r}l{,d}q */,
>> +    [0xc] = (1 << 0x2) /* vcmp{p,s}{d,s} */ |
>> +            (1 << 0x4) /* vpinsrw */ |
>> +            (1 << 0x5) /* vpextrw */ |
>> +            (1 << 0x6) /* vshufp{d,s} */,
>> +};
>> +
>> +static struct x86_emulate_ops emulops;
>> +
>> +static unsigned int accessed[3 * 64];
> 
> What are the expected properties?  Why 3 * ?

See record_access(): The instructions under test all get a Disp8 value
of 1 encoded. In order to be able to sensibly see how exactly things
go wrong (during debugging), it simply helps to cover the entire range
from zero to 3 times the (maximum) vector length. All accesses farther
out of bounds than by vector length will not be recorded here, and
hence fail "silently".

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.