[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/5] x86/alternative: Implement NMI/#MC-safe patching



On 31/01/18 11:00, Jan Beulich wrote:
>>>> On 30.01.18 at 20:26, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 30/01/18 08:39, Jan Beulich wrote:
>>>>>> On 29.01.18 at 16:38, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> +    /*
>>>> +     * We are the CPU performing the patching, and might have ended up 
>>>> here by
>>>> +     * hitting a breakpoint.
>>>> +     *
>>>> +     * Either way, we need to complete particular patch to make forwards
>>>> +     * progress.  This logic is safe even if executed recursively in the
>>>> +     * breakpoint handler; the worst that will happen when normal 
>>>> execution
>>>> +     * resumes is that we will rewrite the same bytes a second time.
>>>> +     */
>>>> +
>>>> +    /* First, insert a breakpoint to prevent execution of the patch site. 
>>>> */
>>>> +    i->addr[0] = 0xcc;
>>>> +    smp_wmb();
>>> This is necessary, but not sufficient when replacing more than a
>>> single insn: The other CPU may be executing instructions _after_
>>> the initial one that is being replaced, and ...
>>>
>>>> +    /* Second, copy the remaining instructions into place. */
>>>> +    memcpy(i->addr + 1, i->opcode + 1, i->len - 1);
>>> ... this may be altering things underneath its feet.
>> Hmm.
>>
>> It is completely impossible to recover if another CPU hits the middle of
>> this memcpy().  It is impossible to synchronise a specific write against
>> the remote CPU pipelines.
> It is not completely impossible, but the solution would be awkward
> to use: If the code doing the patching knew instruction boundaries,
> it could put breakpoints onto all of them. Or maybe it isn't all that
> bad to use - we have an insn decoder after all.

An instruction decoder doesn't help.  Yes - it allows us to avoid
executing a spliced instruction, but we can't recover/rewind state for a
cpu which hits one of these latter breakpoints.

>
>> The only way to be safe is to guarantee that CPUs can't hit the code to
>> begin with.
>>
>> On AMD, we can use STGI/CLGI to do this sensibly, and really really
>> inhibit all interrupts.
> Not really for #MC - clear GIF may also result in shutdown when
> one would need delivering.

With GIF clear, #MC is held pending if possible, but otherwise a
shutdown will occur.

From that point of view, it is slightly better than clearing CR4.MCE,
but not by much.

>
>>  For Intel, we can fix the NMI problem by
>> rendezvousing and running the patching loop in NMI context, at which
>> point the hardware latch will prevent further NMIs.
>>
>> However, there is literally nothing we can do to prevent #MC from
>> arriving.  We can stop servicing #MC by disabling CR4.MCE, but then the
>> processor will shut down.
> Not a good idea indeed.
>
>> We can't put a big barrier at the head of #MC handler which delays
>> processing, because we have no way to ensure that processors aren't
>> already later in the #MC handler.  Furthermore, attempting to do so
>> heightens the risk of hitting a shutdown condition from a second queued #MC.
>>
>>
>> The chance of hitting an NMI/#MC collide with patching is already
>> minuscule.  Alternatives and livepatching have been used (by XenServer
>> alone) in this form for nearly 3 years, across millions of servers,
>> without a single report of such a collision.  The chance of an #MC
>> collision is far less likely than an NMI collision, and in the best case.
>>
>> While we can arrange and test full NMI safety, we cannot test #MC safety
>> at all.  Any code to try and implement #MC safety is going to be
>> complicated, and make things worse if an #MC does hit.
>>
>> Therefore, I agree with the Linux view that trying to do anything for
>> #MC safety is going to be worse than doing nothing at all.
> I agree. But as said before - the immediate goal ought to be to
> make alternatives patching safe, and that doesn't require any
> of these considerations.

? Of course it requires these considerations.

If we ignore the livepatching side of things (which includes an
alternatives pass), then we can make the boot-time alternatives pass NMI
safe by explicitly choosing to patch in NMI context before we've booted
the APs.

Arranging for boot time alternatives to be NMI-safe is easy.  Arranging
for livepatching to be NMI-safe is also fairly easy.

Arranging for anything, *even at boot time* to be #MC safe, is very
complicated, and will be a large amount of code we cannot test.  Hence
why I'm leaning towards the Linux solution for the #MC problem.

>
>>>> @@ -153,7 +231,31 @@ void init_or_livepatch add_nops(void *insns, unsigned 
>>>> int len)
>>>>  void init_or_livepatch noinline
>>>>  text_poke(void *addr, const void *opcode, size_t len, bool live)
>>>>  {
>>>> -    memcpy(addr, opcode, len);
>>>> +    if ( !live || len == 1 )
>>>> +    {
>>>> +        /*
>>>> +         * If we know *addr can't be executed, or we are patching a single
>>>> +         * byte, it is safe to use a straight memcpy().
>>>> +         */
>>>> +        memcpy(addr, opcode, len);
>>> Is it really worth special casing this? Whether to actually ack
>>> patches 2 and 3 depends on that.
>> Yes, and even more so if we are going to use an NMI rendezvous.  We
>> definitely don't want to have an NMI rendezvous while applying
>> alternatives as part of livepatch preparation.
> Well, again - live patching should be the second step here.

You keep on saying this, and I can only conclude that you are confused
as to which actions happen when.

At boot time, we do a single alternatives pass on the live .text section.

At livepatch load time, we do an alternatives pass on the incoming
.text, but this is safe because the text definitely isn't being executed.

At livepatch apply time, we quiesce the system and do a patching pass on
the live .text, most of which are `jmp disp32`.


We therefore have two alternatives passes (one safe, one unsafe), and
one (unsafe) livepatch pass.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.