Xen project Mailing List

Re: [Xen-devel] [PATCH 2/4] x86/pv: Introduce pv_create_exception_frame()

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 9 May 2017 18:09:03 +0100

Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 09 May 2017 17:10:24 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 09/05/17 16:58, Jan Beulich wrote: >>>> On 08.05.17 at 17:48, <andrew.cooper3@xxxxxxxxxx> wrote: >> +void pv_create_exception_frame(void) >> +{ >> + struct vcpu *curr = current; >> + struct trap_bounce *tb = &curr->arch.pv_vcpu.trap_bounce; > const (twice)? > >> + struct cpu_user_regs *regs = guest_cpu_user_regs(); >> + const bool user_mode_frame = !guest_kernel_mode(curr, regs); >> + uint8_t *evt_mask = &vcpu_info(curr, evtchn_upcall_mask); >> + unsigned long rflags; > Does this really need to be "long"? The answer to several of these questions are "probably not, but that's how load_segments() did it". > >> + unsigned int bytes, missing; >> + >> + ASSERT_NOT_IN_ATOMIC(); >> + >> + if ( unlikely(null_trap_bounce(curr, tb)) ) >> + { >> + gprintk(XENLOG_ERR, "Fatal: Attempting to inject null trap >> bounce\n"); >> + __domain_crash_synchronous(); > Why not domain_crash() followed by "return"? Because the existing code uses synchronous crashes. Looking again at the callsites of pv_create_exception_frame(), we immediately jump back to {compat_,}test_all_events, which proceeds to run softirqs again. Therefore, domain_crash() and a return should work. (I think?) > >> + } >> + >> + /* Fold the upcall mask and architectural IOPL into the guests rflags. >> */ >> + rflags = regs->rflags & ~(X86_EFLAGS_IF | X86_EFLAGS_IOPL); >> + rflags |= ((*evt_mask ? 0 : X86_EFLAGS_IF) | >> + (VM_ASSIST(curr->domain, architectural_iopl) >> + ? curr->arch.pv_vcpu.iopl : 0)); >> + >> + if ( is_pv_32bit_vcpu(curr) ) >> + { >> + /* { [ERRCODE,] EIP, CS/MASK , EFLAGS, [ESP, SS] } */ >> + unsigned int frame[6], *ptr = frame, ksp = >> + (user_mode_frame ? curr->arch.pv_vcpu.kernel_sp : regs->esp); >> + >> + if ( tb->flags & TBF_EXCEPTION_ERRCODE ) >> + *ptr++ = tb->error_code; >> + >> + *ptr++ = regs->eip; >> + *ptr++ = regs->cs | (((unsigned int)*evt_mask) << 16); > Do you really need the cast here? Does it promote correctly if the top bit of the mask is set? > In no case is there a need for the > parentheses around the cast expression. > >> + *ptr++ = rflags; >> + >> + if ( user_mode_frame ) >> + { >> + *ptr++ = regs->esp; >> + *ptr++ = regs->ss; >> + } >> + >> + /* Copy the constructed frame to the guest kernel stack. */ >> + bytes = _p(ptr) - _p(frame); >> + ksp -= bytes; >> + >> + if ( unlikely((missing = __copy_to_user(_p(ksp), frame, bytes)) != >> 0) ) > While I don't think we need to be really bothered, it's perhaps still > worth noting in a comment that the wrapping behavior here is > wrong (and slightly worse than the assembly original), due to > (implicit) address arithmetic all being done with 64-bit operands. Ah - At some point, I had a comment here explaining the lack of an __access_ok() check, but it appears to have got lost in a rebase. I will try to reinstate it. The wrapping behaviour around the 4GB => 0 boundary is undefined, and different between Intel and AMD (as we discovered with XSA-186). If we passing the exception back to the guest we would need to swap #PF for #SS (for Intel), or properly wrap around (for AMD). Would it be ok just to comment this point and leave it as is? > >> + { >> + gprintk(XENLOG_ERR, "Fatal: Fault while writing exception >> frame\n"); >> + show_page_walk(ksp + missing); >> + __domain_crash_synchronous(); >> + } >> + >> + /* Rewrite our stack frame. */ >> + regs->rip = (uint32_t)tb->eip; >> + regs->cs = tb->cs; >> + regs->eflags &= ~(X86_EFLAGS_VM | X86_EFLAGS_RF | >> + X86_EFLAGS_NT | X86_EFLAGS_TF); > You write ->rip above and ->rsp below - preferably those would > become ->eip and ->esp, but alternatively (for consistency) this > may want switching to ->rflags. Ah - these are deliberately 64bit values even in the 32bit path, so a 32bit guest with an unexpected 64bit code segment will be truncated back into its own range. I will comment this point, and switch to using rflags. > >> + regs->rsp = ksp; >> + if ( user_mode_frame ) >> + regs->ss = curr->arch.pv_vcpu.kernel_ss; >> + } >> + else >> + { >> + /* { RCX, R11, [ERRCODE,] RIP, CS/MASK, RFLAGS, RSP, SS } */ >> + unsigned long frame[7], *ptr = frame, ksp = > I clearly count 8 elements in the comment. :) > >> + (user_mode_frame ? curr->arch.pv_vcpu.kernel_sp : regs->rsp) & >> ~0xf; >> + >> + if ( user_mode_frame ) >> + toggle_guest_mode(curr); >> + >> + *ptr++ = regs->rcx; >> + *ptr++ = regs->r11; >> + >> + if ( tb->flags & TBF_EXCEPTION_ERRCODE ) >> + *ptr++ = tb->error_code; >> + >> + *ptr++ = regs->rip; >> + *ptr++ = (user_mode_frame ? regs->cs : regs->cs & ~3) | >> + ((unsigned long)(*evt_mask) << 32); > Stray parentheses again. > >> + *ptr++ = rflags; >> + *ptr++ = regs->rsp; >> + *ptr++ = regs->ss; >> + >> + /* Copy the constructed frame to the guest kernel stack. */ >> + bytes = _p(ptr) - _p(frame); >> + ksp -= bytes; >> + >> + if ( unlikely(!__addr_ok(ksp)) ) >> + { >> + gprintk(XENLOG_ERR, "Fatal: Bad guest kernel stack %p\n", >> _p(ksp)); >> + __domain_crash_synchronous(); >> + } >> + else if ( unlikely((missing = >> + __copy_to_user(_p(ksp), frame, bytes)) != 0) ) >> + { >> + gprintk(XENLOG_ERR, "Fatal: Fault while writing exception >> frame\n"); >> + show_page_walk(ksp + missing); >> + __domain_crash_synchronous(); >> + } >> + >> + /* Rewrite our stack frame. */ >> + regs->entry_vector |= TRAP_syscall; >> + regs->rip = tb->eip; >> + regs->cs = FLAT_KERNEL_CS; >> + regs->rflags &= ~(X86_EFLAGS_AC | X86_EFLAGS_VM | >> X86_EFLAGS_RF | >> + X86_EFLAGS_NT | X86_EFLAGS_TF); >> + regs->rsp = ksp; >> + regs->ss = FLAT_KERNEL_SS; >> + } >> + >> + /* Mask events if requested. */ >> + if ( tb->flags & TBF_INTERRUPT ) >> + *evt_mask = 1; >> + >> + /* >> + * Clobber the injection information now it has been completed. Buggy >> + * attempts to inject the same event twice will hit the >> null_trap_bounce() >> + * check above. >> + */ >> + *tb = (struct trap_bounce){}; > Ah, so that prevents tb becoming a pointer to const. I wonder > though whether, on a rather hot path, we really want to zap the > entire structure here. As I can see the value in satisfying > null_trap_bounce(), how about zapping just ->eip / ->cs on the > split paths above? This ends up being two 8-byte writes of zeroes into a cache-hot line; it isn't by any means a slow part of this path, whereas the 16bit write to clobber just %cs would be. Irrespective of that, the following patch depends on this clobbering of ->flags. > Overall, did you compare generated code with the current > assembly implementation? That one surely would have had some > room for improvement, so the result here at least shouldn't be > worse than that. The final C version (including failsafe, and some error handling the asm functions didn't have) is a bit less than twice the size of the asm functions in terms of absolute size. I haven't done any performance analysis, but I trust the compiler to make better code overall (there are definitely pipeline stalls in the asm versions), and wouldn't be surprised if it is faster to execute. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.