[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/5] x86/alternative: Implement NMI/#MC-safe patching



On 30/01/18 08:39, Jan Beulich wrote:
>>>> On 29.01.18 at 16:38, <andrew.cooper3@xxxxxxxxxx> wrote:
>> +bool init_or_livepatch text_poke_live(const struct cpu_user_regs *regs)
>> +{
>> +    struct live_poke_info *i = &live_poke_info;
>> +
>> +    if ( unlikely(i->cpu != smp_processor_id()) )
>> +    {
>> +        ASSERT(regs);
>> +
>> +        /*
>> +         * We hit a breakpoint, on a CPU which was not performing the
>> +         * patching.  This is only expected to be possible for the NMI/#MC
>> +         * paths, and even then, only if we hit the tiny race window below
>> +         * while patching in the NMI/#MC handlers.
>> +         *
>> +         * We can't safely evaluate whether we hit a transient breakpoint
>> +         * because i->cpu has likely completed the patch and moved on to the
>> +         * next patch site.
>> +         *
>> +         * Go to sleep for a bit and synchronise the pipeline as we are now 
>> in
>> +         * a cross-modifying scenario.
>> +         */
>> +        cpu_relax();
>> +        cpuid_eax(0);
>> +
>> +        /*
>> +         * Presume that we hit the transient breakpoint, as we can't confirm
>> +         * whether we did or not.  We don't expect there to be any other
>> +         * breakpoints to hit, but if we did hit another one, then in the
>> +         * worse case we will only livelock until i->cpu has finished all of
>> +         * its patching.
>> +         */
>> +        return true;
>> +    }
>> +
>> +    /*
>> +     * We are the CPU performing the patching, and might have ended up here 
>> by
>> +     * hitting a breakpoint.
>> +     *
>> +     * Either way, we need to complete particular patch to make forwards
>> +     * progress.  This logic is safe even if executed recursively in the
>> +     * breakpoint handler; the worst that will happen when normal execution
>> +     * resumes is that we will rewrite the same bytes a second time.
>> +     */
>> +
>> +    /* First, insert a breakpoint to prevent execution of the patch site. */
>> +    i->addr[0] = 0xcc;
>> +    smp_wmb();
> This is necessary, but not sufficient when replacing more than a
> single insn: The other CPU may be executing instructions _after_
> the initial one that is being replaced, and ...
>
>> +    /* Second, copy the remaining instructions into place. */
>> +    memcpy(i->addr + 1, i->opcode + 1, i->len - 1);
> ... this may be altering things underneath its feet.

Hmm.

It is completely impossible to recover if another CPU hits the middle of
this memcpy().  It is impossible to synchronise a specific write against
the remote CPU pipelines.

The only way to be safe is to guarantee that CPUs can't hit the code to
begin with.

On AMD, we can use STGI/CLGI to do this sensibly, and really really
inhibit all interrupts.  For Intel, we can fix the NMI problem by
rendezvousing and running the patching loop in NMI context, at which
point the hardware latch will prevent further NMIs.

However, there is literally nothing we can do to prevent #MC from
arriving.  We can stop servicing #MC by disabling CR4.MCE, but then the
processor will shut down.

We can't put a big barrier at the head of #MC handler which delays
processing, because we have no way to ensure that processors aren't
already later in the #MC handler.  Furthermore, attempting to do so
heightens the risk of hitting a shutdown condition from a second queued #MC.


The chance of hitting an NMI/#MC collide with patching is already
minuscule.  Alternatives and livepatching have been used (by XenServer
alone) in this form for nearly 3 years, across millions of servers,
without a single report of such a collision.  The chance of an #MC
collision is far less likely than an NMI collision, and in the best case.

While we can arrange and test full NMI safety, we cannot test #MC safety
at all.  Any code to try and implement #MC safety is going to be
complicated, and make things worse if an #MC does hit.

Therefore, I agree with the Linux view that trying to do anything for
#MC safety is going to be worse than doing nothing at all.

>> @@ -153,7 +231,31 @@ void init_or_livepatch add_nops(void *insns, unsigned 
>> int len)
>>  void init_or_livepatch noinline
>>  text_poke(void *addr, const void *opcode, size_t len, bool live)
>>  {
>> -    memcpy(addr, opcode, len);
>> +    if ( !live || len == 1 )
>> +    {
>> +        /*
>> +         * If we know *addr can't be executed, or we are patching a single
>> +         * byte, it is safe to use a straight memcpy().
>> +         */
>> +        memcpy(addr, opcode, len);
> Is it really worth special casing this? Whether to actually ack
> patches 2 and 3 depends on that.

Yes, and even more so if we are going to use an NMI rendezvous.  We
definitely don't want to have an NMI rendezvous while applying
alternatives as part of livepatch preparation.

>
>> +        };
>> +        smp_wmb();
>> +        active_text_patching = true;
>> +        smp_wmb();
>> +        text_poke_live(NULL);
>> +        smp_wmb();
>> +        active_text_patching = false;
> Perhaps better to zap live_poke_info.cpu again here? That could
> in fact replace the separate active_text_patching flag.

No - it cant because...

>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -119,6 +119,8 @@ boolean_param("ler", opt_ler);
>>  #define stack_words_per_line 4
>>  #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)regs->rsp)
>>  
>> +bool active_text_patching;
> Why here rather than in alternative.c?

... in the !LIVEPATCH case, alternative.c is an .init.o and the build
fails because of a nonzero bss.

We cannot have a possibly-init variable referenced in the head of an
exception handler.

>> @@ -686,9 +689,36 @@ ENTRY(debug)
>>          jmp   handle_exception
>>  
>>  ENTRY(int3)
>> +        /* For patching-safety, there must not be any alternatives here. */
>>          pushq $0
>>          movl  $TRAP_int3,4(%rsp)
>> -        jmp   handle_exception
>> +
>> +        /* If there is no patching active, continue normally.  */
>> +        cmpb  $1, active_text_patching(%rip)
> I think it is better to compare against zero in cases like this. But
> then - is this safe? There's no guarantee that the INT3 handling
> on a non-patching CPU makes it here before the patching CPU
> clears the flag again.

Hmm - in the case of an non-patching CPU, we'd need to peek under %rip
to see if the breakpoint had disappeared in the case we find this
boolean clear.  OTOH, the NMI-rendezvous plan would avoid this problem
to begin with.

>
>> +        je    handle_exception_gprs_saved
>> +
>> +        /*
>> +         * We think we hit the transient debug trap.  text_poke_live() has
>> +         * probably completed the patching, so rewind the instruction 
>> pointer
>> +         * and try again.
>> +         */
>> +        subq  $1, UREGS_rip(%rsp)
>> +        jmp   restore_all_xen
> If it was an NMI that was interrupted, you mustn't return using
> IRET.

I thought we fixed that, but it appears that the patch never got in.

>
> Overall I think your attempt to solve two problems at once here
> goes beyond what we need immediately: Live patching in an NMI/
> #MC safe way ought to be a second step. The immediate goal
> should be to make boot time alternatives patching NMI/#MC safe,
> and that can be had with a slimmed down version of this patch
> (as at that time only a single CPU is up); most of the not purely
> mechanical issues pointed out above are solely related to the
> live patching scenario.

I could try this, but I don't think the result will be much simpler.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.