[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available



>>> On 01.03.18 at 17:58, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 01/03/18 10:54, Jan Beulich wrote:
>>>>> On 01.03.18 at 11:36, <roger.pau@xxxxxxxxxx> wrote:
>>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>>>>>>> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 02/28/18 7:20 PM >>>
>>>>> On 28/02/18 16:22, Jan Beulich wrote:
>>>>>>>>> On 26.02.18 at 12:35, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>>>> --- a/xen/include/asm-x86/alternative-asm.h
>>>>>>> +++ b/xen/include/asm-x86/alternative-asm.h
>>>>>>> @@ -1,6 +1,8 @@
>>>>>>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>>  
>>>>>>> +#include <asm/nops.h>
>>>>>>> +
>>>>>>>  #ifdef __ASSEMBLY__
>>>>>>>  
>>>>>>>  /*
>>>>>>> @@ -18,6 +20,14 @@
>>>>>>>      .byte \pad_len
>>>>>>>  .endm
>>>>>>>  
>>>>>>> +.macro mknops nr_bytes
>>>>>>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>>>>>>> +    .nop \nr_bytes, ASM_NOP_MAX
>>>>>> And do you really need to specify ASM_NOP_MAX here? What's
>>>>>> wrong with letting the assembler pick what it wants as the longest
>>>>>> NOP?
>>>>> I don't want a toolchain change to cause us to go beyond 11 byte nops,
>>>>> because of the associated decode stall on almost all hardware.  Using
>>>>> ASM_NOP_MAX seemed like the easiest way to keep the end result
>>>>> consistent, irrespective of toolchain support.
>>>> I don't understand - an earlier patch takes care of runtime replacing them
>>>> anyway. What stalls can then result?
>>> The runtime replacement won't happen when using the .nops directive
>>> AFAICT, because the original padding section will likely be filled
>>> with opcodes different than 0x90, and thus the runtime nop
>>> optimization won't be performed.
>> Oh, indeed. That puts under question the whole idea of using
>> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
>> to withdraw my ack.
>>
>>> I also agree that using the default (not proving a second argument)
>>> seems like a better solution. Why would the toolstack switch to
>>> something that leads to worse performance? That would certainly be
>>> considered a bug.
>> Why? They may change it based on data available for newer /
>> older / whatever hardware. Any build-time choice is going to be
>> suboptimal somewhere, so I think we absolutely should not
>> bypass runtime replacing these NOPs, the more that now we
>> may have quite large sequences of them.
> 
> The pont of having the toolchain put out optimised nops is to avoid the
> need for us to patch the site at all.  I.e. calling optimise_nops() on a
> set of toolchain nops defeats the purpose in the overwhelming common
> case of running on a system which prefers P6 nops.
> 
> The problem of working out when to optimise is that, when we come to
> apply an individual alternative, we don't know if we've already patched
> this site before.  Even the unoptimised algorithm has a corner case
> which explodes, if there is a stream of 0x90's on the end of a
> replacement e.g. in a imm or disp field.
> 
> Put simply, we cannot determine, by peeking at the patchsite, whether it
> has been patched or not (other than keeping a full copy of the origin
> site as a reference).  As soon as we chose to optimise the nops of the
> origin site, we cannot determine anything at all.
> 
> Thinking out loud, we could perhaps have a section containing one byte
> per origin site, which we use to track whether we've already optimised
> the padding bytes, and whether the contents have been replaced.  This
> would also add an extra long into struct altentry, but its all cold data
> after boot.

What about alternatively simply updating the struct alt_instr
instances to describe the code _after_ a patch that was applied?
That'll allow to always know how much padding there is.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.