[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] x86emul: consolidate loop counter handling



On 06/12/16 13:38, Jan Beulich wrote:
> Rename _get_rep_prefix() to make it more visibly fit other use cases
> and introduce a companion "put". Use them for repeated string insn
> handling as well as LOOP/J?CXZ instead of open coding the same logic a
> couple of times.
>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -913,7 +913,7 @@ do {
>      put_stub(stub);                                                     \
>  } while (0)
>  
> -static unsigned long _get_rep_prefix(
> +static inline unsigned long get_loop_count(
>      const struct cpu_user_regs *int_regs,
>      int ad_bytes)
>  {
> @@ -922,10 +922,21 @@ static unsigned long _get_rep_prefix(
>             int_regs->ecx;
>  }
>  
> +static inline void put_loop_count(
> +    struct cpu_user_regs *int_regs,

Why the int_regs name for the parameter?  I can see it is to match
get_loop_count(), but both would be better just using regs imo.

> +    int ad_bytes,
> +    unsigned long count)
> +{
> +    if ( ad_bytes == 2 )
> +        *(uint16_t *)&int_regs->ecx = count;
> +    else
> +        int_regs->ecx = ad_bytes == 4 ? (uint32_t)count : count;
> +}
> +
>  #define get_rep_prefix() ({                                             \
>      unsigned long max_reps = 1;                                         \
>      if ( rep_prefix() )                                                 \
> -        max_reps = _get_rep_prefix(&_regs, ad_bytes);                   \
> +        max_reps = get_loop_count(&_regs, ad_bytes);                    \
>      if ( max_reps == 0 )                                                \
>      {                                                                   \
>          /* Skip the instruction if no repetitions are required. */      \
> @@ -941,21 +952,14 @@ static void __put_rep_prefix(
>      int ad_bytes,
>      unsigned long reps_completed)
>  {
> -    unsigned long ecx = ((ad_bytes == 2) ? (uint16_t)int_regs->ecx :
> -                         (ad_bytes == 4) ? (uint32_t)int_regs->ecx :
> -                         int_regs->ecx);
> +    unsigned long ecx = get_loop_count(int_regs, ad_bytes);
>  
>      /* Reduce counter appropriately, and repeat instruction if non-zero. */
>      ecx -= reps_completed;
>      if ( ecx != 0 )
>          int_regs->eip = ext_regs->eip;
>  
> -    if ( ad_bytes == 2 )
> -        *(uint16_t *)&int_regs->ecx = ecx;
> -    else if ( ad_bytes == 4 )
> -        int_regs->ecx = (uint32_t)ecx;
> -    else
> -        int_regs->ecx = ecx;
> +    put_loop_count(int_regs, ad_bytes, ecx);
>  }
>  
>  #define put_rep_prefix(reps_completed) ({                               \
> @@ -3977,33 +3981,21 @@ x86_emulate(
>          break;
>  
>      case 0xe0 ... 0xe2: /* loop{,z,nz} */ {
> +        unsigned long count = get_loop_count(&_regs, ad_bytes) - 1;
>          int do_jmp = !(_regs.eflags & EFLG_ZF); /* loopnz */
>  
>          if ( b == 0xe1 )
>              do_jmp = !do_jmp; /* loopz */
>          else if ( b == 0xe2 )
>              do_jmp = 1; /* loop */
> -        switch ( ad_bytes )
> -        {
> -        case 2:
> -            do_jmp &= --(*(uint16_t *)&_regs.ecx) != 0;
> -            break;
> -        case 4:
> -            do_jmp &= --(*(uint32_t *)&_regs.ecx) != 0;
> -            _regs.ecx = (uint32_t)_regs.ecx; /* zero extend in x86/64 mode */
> -            break;
> -        default: /* case 8: */
> -            do_jmp &= --_regs.ecx != 0;
> -            break;
> -        }
> -        if ( do_jmp )
> +        if ( count && do_jmp )
>              jmp_rel((int32_t)src.val);
> +        put_loop_count(&_regs, ad_bytes, count);

I think this would also be clearer to follow if it had the form:

unsigned long count = get_loop_count(&_regs, ad_bytes);
...
put_loop_count(&_regs, ad_bytes, count - 1);
if ( count != 0 && do_jmp )
    jmp_rel((int32_t)src.val);

Having said that, src.val is unconditionally a signed 8 byte immediate,
so I would have expected this to be an int8_t cast, rather than int32_t.

Finally however, the emulated behaviour is wrong.  The manual states
"Note that the LOOP instruction ignores REX.W; but 64-bit address size
can be over-ridden using a 67H prefix."

I think we need some extra early operand decoding to clobber REX.W, then
feed 67 conditionally back into ad_bytes.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.