Xen project Mailing List

Re: [Xen-devel] [PATCH 4/5] x86/alternative: Implement NMI/#MC-safe patching

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 30 Jan 2018 19:26:45 +0000

Cc: RossLagerwall <ross.lagerwall@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 30 Jan 2018 19:27:02 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 30/01/18 08:39, Jan Beulich wrote: >>>> On 29.01.18 at 16:38, <andrew.cooper3@xxxxxxxxxx> wrote: >> +bool init_or_livepatch text_poke_live(const struct cpu_user_regs *regs) >> +{ >> + struct live_poke_info *i = &live_poke_info; >> + >> + if ( unlikely(i->cpu != smp_processor_id()) ) >> + { >> + ASSERT(regs); >> + >> + /* >> + * We hit a breakpoint, on a CPU which was not performing the >> + * patching. This is only expected to be possible for the NMI/#MC >> + * paths, and even then, only if we hit the tiny race window below >> + * while patching in the NMI/#MC handlers. >> + * >> + * We can't safely evaluate whether we hit a transient breakpoint >> + * because i->cpu has likely completed the patch and moved on to the >> + * next patch site. >> + * >> + * Go to sleep for a bit and synchronise the pipeline as we are now >> in >> + * a cross-modifying scenario. >> + */ >> + cpu_relax(); >> + cpuid_eax(0); >> + >> + /* >> + * Presume that we hit the transient breakpoint, as we can't confirm >> + * whether we did or not. We don't expect there to be any other >> + * breakpoints to hit, but if we did hit another one, then in the >> + * worse case we will only livelock until i->cpu has finished all of >> + * its patching. >> + */ >> + return true; >> + } >> + >> + /* >> + * We are the CPU performing the patching, and might have ended up here >> by >> + * hitting a breakpoint. >> + * >> + * Either way, we need to complete particular patch to make forwards >> + * progress. This logic is safe even if executed recursively in the >> + * breakpoint handler; the worst that will happen when normal execution >> + * resumes is that we will rewrite the same bytes a second time. >> + */ >> + >> + /* First, insert a breakpoint to prevent execution of the patch site. */ >> + i->addr[0] = 0xcc; >> + smp_wmb(); > This is necessary, but not sufficient when replacing more than a > single insn: The other CPU may be executing instructions _after_ > the initial one that is being replaced, and ... > >> + /* Second, copy the remaining instructions into place. */ >> + memcpy(i->addr + 1, i->opcode + 1, i->len - 1); > ... this may be altering things underneath its feet. Hmm. It is completely impossible to recover if another CPU hits the middle of this memcpy(). It is impossible to synchronise a specific write against the remote CPU pipelines. The only way to be safe is to guarantee that CPUs can't hit the code to begin with. On AMD, we can use STGI/CLGI to do this sensibly, and really really inhibit all interrupts. For Intel, we can fix the NMI problem by rendezvousing and running the patching loop in NMI context, at which point the hardware latch will prevent further NMIs. However, there is literally nothing we can do to prevent #MC from arriving. We can stop servicing #MC by disabling CR4.MCE, but then the processor will shut down. We can't put a big barrier at the head of #MC handler which delays processing, because we have no way to ensure that processors aren't already later in the #MC handler. Furthermore, attempting to do so heightens the risk of hitting a shutdown condition from a second queued #MC. The chance of hitting an NMI/#MC collide with patching is already minuscule. Alternatives and livepatching have been used (by XenServer alone) in this form for nearly 3 years, across millions of servers, without a single report of such a collision. The chance of an #MC collision is far less likely than an NMI collision, and in the best case. While we can arrange and test full NMI safety, we cannot test #MC safety at all. Any code to try and implement #MC safety is going to be complicated, and make things worse if an #MC does hit. Therefore, I agree with the Linux view that trying to do anything for #MC safety is going to be worse than doing nothing at all. >> @@ -153,7 +231,31 @@ void init_or_livepatch add_nops(void *insns, unsigned >> int len) >> void init_or_livepatch noinline >> text_poke(void *addr, const void *opcode, size_t len, bool live) >> { >> - memcpy(addr, opcode, len); >> + if ( !live || len == 1 ) >> + { >> + /* >> + * If we know *addr can't be executed, or we are patching a single >> + * byte, it is safe to use a straight memcpy(). >> + */ >> + memcpy(addr, opcode, len); > Is it really worth special casing this? Whether to actually ack > patches 2 and 3 depends on that. Yes, and even more so if we are going to use an NMI rendezvous. We definitely don't want to have an NMI rendezvous while applying alternatives as part of livepatch preparation. > >> + }; >> + smp_wmb(); >> + active_text_patching = true; >> + smp_wmb(); >> + text_poke_live(NULL); >> + smp_wmb(); >> + active_text_patching = false; > Perhaps better to zap live_poke_info.cpu again here? That could > in fact replace the separate active_text_patching flag. No - it cant because... >> --- a/xen/arch/x86/traps.c >> +++ b/xen/arch/x86/traps.c >> @@ -119,6 +119,8 @@ boolean_param("ler", opt_ler); >> #define stack_words_per_line 4 >> #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)regs->rsp) >> >> +bool active_text_patching; > Why here rather than in alternative.c? ... in the !LIVEPATCH case, alternative.c is an .init.o and the build fails because of a nonzero bss. We cannot have a possibly-init variable referenced in the head of an exception handler. >> @@ -686,9 +689,36 @@ ENTRY(debug) >> jmp handle_exception >> >> ENTRY(int3) >> + /* For patching-safety, there must not be any alternatives here. */ >> pushq $0 >> movl $TRAP_int3,4(%rsp) >> - jmp handle_exception >> + >> + /* If there is no patching active, continue normally. */ >> + cmpb $1, active_text_patching(%rip) > I think it is better to compare against zero in cases like this. But > then - is this safe? There's no guarantee that the INT3 handling > on a non-patching CPU makes it here before the patching CPU > clears the flag again. Hmm - in the case of an non-patching CPU, we'd need to peek under %rip to see if the breakpoint had disappeared in the case we find this boolean clear. OTOH, the NMI-rendezvous plan would avoid this problem to begin with. > >> + je handle_exception_gprs_saved >> + >> + /* >> + * We think we hit the transient debug trap. text_poke_live() has >> + * probably completed the patching, so rewind the instruction >> pointer >> + * and try again. >> + */ >> + subq $1, UREGS_rip(%rsp) >> + jmp restore_all_xen > If it was an NMI that was interrupted, you mustn't return using > IRET. I thought we fixed that, but it appears that the patch never got in. > > Overall I think your attempt to solve two problems at once here > goes beyond what we need immediately: Live patching in an NMI/ > #MC safe way ought to be a second step. The immediate goal > should be to make boot time alternatives patching NMI/#MC safe, > and that can be had with a slimmed down version of this patch > (as at that time only a single CPU is up); most of the not purely > mechanical issues pointed out above are solely related to the > live patching scenario. I could try this, but I don't think the result will be much simpler. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.