[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/5] x86/alternative: Implement NMI/#MC-safe patching

>>> On 01.02.18 at 16:20, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 31/01/18 11:00, Jan Beulich wrote:
>>>>> On 30.01.18 at 20:26, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 30/01/18 08:39, Jan Beulich wrote:
>>>>>>> On 29.01.18 at 16:38, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>> +    /*
>>>>> +     * We are the CPU performing the patching, and might have ended up 
>>>>> here by
>>>>> +     * hitting a breakpoint.
>>>>> +     *
>>>>> +     * Either way, we need to complete particular patch to make forwards
>>>>> +     * progress.  This logic is safe even if executed recursively in the
>>>>> +     * breakpoint handler; the worst that will happen when normal 
>>>>> execution
>>>>> +     * resumes is that we will rewrite the same bytes a second time.
>>>>> +     */
>>>>> +
>>>>> +    /* First, insert a breakpoint to prevent execution of the patch 
>>>>> site. */
>>>>> +    i->addr[0] = 0xcc;
>>>>> +    smp_wmb();
>>>> This is necessary, but not sufficient when replacing more than a
>>>> single insn: The other CPU may be executing instructions _after_
>>>> the initial one that is being replaced, and ...
>>>>> +    /* Second, copy the remaining instructions into place. */
>>>>> +    memcpy(i->addr + 1, i->opcode + 1, i->len - 1);
>>>> ... this may be altering things underneath its feet.
>>> Hmm.
>>> It is completely impossible to recover if another CPU hits the middle of
>>> this memcpy().  It is impossible to synchronise a specific write against
>>> the remote CPU pipelines.
>> It is not completely impossible, but the solution would be awkward
>> to use: If the code doing the patching knew instruction boundaries,
>> it could put breakpoints onto all of them. Or maybe it isn't all that
>> bad to use - we have an insn decoder after all.
> An instruction decoder doesn't help.  Yes - it allows us to avoid
> executing a spliced instruction, but we can't recover/rewind state for a
> cpu which hits one of these latter breakpoints.

Oh, true. At least doing so would be very difficult.

>>> While we can arrange and test full NMI safety, we cannot test #MC safety
>>> at all.  Any code to try and implement #MC safety is going to be
>>> complicated, and make things worse if an #MC does hit.
>>> Therefore, I agree with the Linux view that trying to do anything for
>>> #MC safety is going to be worse than doing nothing at all.
>> I agree. But as said before - the immediate goal ought to be to
>> make alternatives patching safe, and that doesn't require any
>> of these considerations.
> ? Of course it requires these considerations.

I don't understand, but see below.

> If we ignore the livepatching side of things (which includes an
> alternatives pass), then we can make the boot-time alternatives pass NMI
> safe by explicitly choosing to patch in NMI context before we've booted
> the APs.
> Arranging for boot time alternatives to be NMI-safe is easy.  Arranging
> for livepatching to be NMI-safe is also fairly easy.
> Arranging for anything, *even at boot time* to be #MC safe, is very
> complicated, and will be a large amount of code we cannot test.  Hence
> why I'm leaning towards the Linux solution for the #MC problem.

I could live with this.

>>>>> @@ -153,7 +231,31 @@ void init_or_livepatch add_nops(void *insns, 
>>>>> unsigned int len)
>>>>>  void init_or_livepatch noinline
>>>>>  text_poke(void *addr, const void *opcode, size_t len, bool live)
>>>>>  {
>>>>> -    memcpy(addr, opcode, len);
>>>>> +    if ( !live || len == 1 )
>>>>> +    {
>>>>> +        /*
>>>>> +         * If we know *addr can't be executed, or we are patching a 
>>>>> single
>>>>> +         * byte, it is safe to use a straight memcpy().
>>>>> +         */
>>>>> +        memcpy(addr, opcode, len);
>>>> Is it really worth special casing this? Whether to actually ack
>>>> patches 2 and 3 depends on that.
>>> Yes, and even more so if we are going to use an NMI rendezvous.  We
>>> definitely don't want to have an NMI rendezvous while applying
>>> alternatives as part of livepatch preparation.
>> Well, again - live patching should be the second step here.
> You keep on saying this, and I can only conclude that you are confused
> as to which actions happen when.
> At boot time, we do a single alternatives pass on the live .text section.
> At livepatch load time, we do an alternatives pass on the incoming
> .text, but this is safe because the text definitely isn't being executed.
> At livepatch apply time, we quiesce the system and do a patching pass on
> the live .text, most of which are `jmp disp32`.
> We therefore have two alternatives passes (one safe, one unsafe), and
> one (unsafe) livepatch pass.

I don't think I'm confused in any way. The approach you've chosen
in the patch - to complete the patching inside the NMI or #MC
handler - is entirely sufficient to deal with NMI and - unless there
are recursive ones - #MC. You don't need NMI context to do NMI
safe patching when only a single CPU is up.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.