|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/4] x86/pv: Introduce pv_create_exception_frame()
On 09/05/17 16:58, Jan Beulich wrote:
>>>> On 08.05.17 at 17:48, <andrew.cooper3@xxxxxxxxxx> wrote:
>> +void pv_create_exception_frame(void)
>> +{
>> + struct vcpu *curr = current;
>> + struct trap_bounce *tb = &curr->arch.pv_vcpu.trap_bounce;
> const (twice)?
>
>> + struct cpu_user_regs *regs = guest_cpu_user_regs();
>> + const bool user_mode_frame = !guest_kernel_mode(curr, regs);
>> + uint8_t *evt_mask = &vcpu_info(curr, evtchn_upcall_mask);
>> + unsigned long rflags;
> Does this really need to be "long"?
The answer to several of these questions are "probably not, but that's
how load_segments() did it".
>
>> + unsigned int bytes, missing;
>> +
>> + ASSERT_NOT_IN_ATOMIC();
>> +
>> + if ( unlikely(null_trap_bounce(curr, tb)) )
>> + {
>> + gprintk(XENLOG_ERR, "Fatal: Attempting to inject null trap
>> bounce\n");
>> + __domain_crash_synchronous();
> Why not domain_crash() followed by "return"?
Because the existing code uses synchronous crashes.
Looking again at the callsites of pv_create_exception_frame(), we
immediately jump back to {compat_,}test_all_events, which proceeds to
run softirqs again.
Therefore, domain_crash() and a return should work. (I think?)
>
>> + }
>> +
>> + /* Fold the upcall mask and architectural IOPL into the guests rflags.
>> */
>> + rflags = regs->rflags & ~(X86_EFLAGS_IF | X86_EFLAGS_IOPL);
>> + rflags |= ((*evt_mask ? 0 : X86_EFLAGS_IF) |
>> + (VM_ASSIST(curr->domain, architectural_iopl)
>> + ? curr->arch.pv_vcpu.iopl : 0));
>> +
>> + if ( is_pv_32bit_vcpu(curr) )
>> + {
>> + /* { [ERRCODE,] EIP, CS/MASK , EFLAGS, [ESP, SS] } */
>> + unsigned int frame[6], *ptr = frame, ksp =
>> + (user_mode_frame ? curr->arch.pv_vcpu.kernel_sp : regs->esp);
>> +
>> + if ( tb->flags & TBF_EXCEPTION_ERRCODE )
>> + *ptr++ = tb->error_code;
>> +
>> + *ptr++ = regs->eip;
>> + *ptr++ = regs->cs | (((unsigned int)*evt_mask) << 16);
> Do you really need the cast here?
Does it promote correctly if the top bit of the mask is set?
> In no case is there a need for the
> parentheses around the cast expression.
>
>> + *ptr++ = rflags;
>> +
>> + if ( user_mode_frame )
>> + {
>> + *ptr++ = regs->esp;
>> + *ptr++ = regs->ss;
>> + }
>> +
>> + /* Copy the constructed frame to the guest kernel stack. */
>> + bytes = _p(ptr) - _p(frame);
>> + ksp -= bytes;
>> +
>> + if ( unlikely((missing = __copy_to_user(_p(ksp), frame, bytes)) !=
>> 0) )
> While I don't think we need to be really bothered, it's perhaps still
> worth noting in a comment that the wrapping behavior here is
> wrong (and slightly worse than the assembly original), due to
> (implicit) address arithmetic all being done with 64-bit operands.
Ah - At some point, I had a comment here explaining the lack of an
__access_ok() check, but it appears to have got lost in a rebase. I
will try to reinstate it.
The wrapping behaviour around the 4GB => 0 boundary is undefined, and
different between Intel and AMD (as we discovered with XSA-186). If we
passing the exception back to the guest we would need to swap #PF for
#SS (for Intel), or properly wrap around (for AMD).
Would it be ok just to comment this point and leave it as is?
>
>> + {
>> + gprintk(XENLOG_ERR, "Fatal: Fault while writing exception
>> frame\n");
>> + show_page_walk(ksp + missing);
>> + __domain_crash_synchronous();
>> + }
>> +
>> + /* Rewrite our stack frame. */
>> + regs->rip = (uint32_t)tb->eip;
>> + regs->cs = tb->cs;
>> + regs->eflags &= ~(X86_EFLAGS_VM | X86_EFLAGS_RF |
>> + X86_EFLAGS_NT | X86_EFLAGS_TF);
> You write ->rip above and ->rsp below - preferably those would
> become ->eip and ->esp, but alternatively (for consistency) this
> may want switching to ->rflags.
Ah - these are deliberately 64bit values even in the 32bit path, so a
32bit guest with an unexpected 64bit code segment will be truncated back
into its own range.
I will comment this point, and switch to using rflags.
>
>> + regs->rsp = ksp;
>> + if ( user_mode_frame )
>> + regs->ss = curr->arch.pv_vcpu.kernel_ss;
>> + }
>> + else
>> + {
>> + /* { RCX, R11, [ERRCODE,] RIP, CS/MASK, RFLAGS, RSP, SS } */
>> + unsigned long frame[7], *ptr = frame, ksp =
> I clearly count 8 elements in the comment.
:)
>
>> + (user_mode_frame ? curr->arch.pv_vcpu.kernel_sp : regs->rsp) &
>> ~0xf;
>> +
>> + if ( user_mode_frame )
>> + toggle_guest_mode(curr);
>> +
>> + *ptr++ = regs->rcx;
>> + *ptr++ = regs->r11;
>> +
>> + if ( tb->flags & TBF_EXCEPTION_ERRCODE )
>> + *ptr++ = tb->error_code;
>> +
>> + *ptr++ = regs->rip;
>> + *ptr++ = (user_mode_frame ? regs->cs : regs->cs & ~3) |
>> + ((unsigned long)(*evt_mask) << 32);
> Stray parentheses again.
>
>> + *ptr++ = rflags;
>> + *ptr++ = regs->rsp;
>> + *ptr++ = regs->ss;
>> +
>> + /* Copy the constructed frame to the guest kernel stack. */
>> + bytes = _p(ptr) - _p(frame);
>> + ksp -= bytes;
>> +
>> + if ( unlikely(!__addr_ok(ksp)) )
>> + {
>> + gprintk(XENLOG_ERR, "Fatal: Bad guest kernel stack %p\n",
>> _p(ksp));
>> + __domain_crash_synchronous();
>> + }
>> + else if ( unlikely((missing =
>> + __copy_to_user(_p(ksp), frame, bytes)) != 0) )
>> + {
>> + gprintk(XENLOG_ERR, "Fatal: Fault while writing exception
>> frame\n");
>> + show_page_walk(ksp + missing);
>> + __domain_crash_synchronous();
>> + }
>> +
>> + /* Rewrite our stack frame. */
>> + regs->entry_vector |= TRAP_syscall;
>> + regs->rip = tb->eip;
>> + regs->cs = FLAT_KERNEL_CS;
>> + regs->rflags &= ~(X86_EFLAGS_AC | X86_EFLAGS_VM |
>> X86_EFLAGS_RF |
>> + X86_EFLAGS_NT | X86_EFLAGS_TF);
>> + regs->rsp = ksp;
>> + regs->ss = FLAT_KERNEL_SS;
>> + }
>> +
>> + /* Mask events if requested. */
>> + if ( tb->flags & TBF_INTERRUPT )
>> + *evt_mask = 1;
>> +
>> + /*
>> + * Clobber the injection information now it has been completed. Buggy
>> + * attempts to inject the same event twice will hit the
>> null_trap_bounce()
>> + * check above.
>> + */
>> + *tb = (struct trap_bounce){};
> Ah, so that prevents tb becoming a pointer to const. I wonder
> though whether, on a rather hot path, we really want to zap the
> entire structure here. As I can see the value in satisfying
> null_trap_bounce(), how about zapping just ->eip / ->cs on the
> split paths above?
This ends up being two 8-byte writes of zeroes into a cache-hot line; it
isn't by any means a slow part of this path, whereas the 16bit write to
clobber just %cs would be.
Irrespective of that, the following patch depends on this clobbering of
->flags.
> Overall, did you compare generated code with the current
> assembly implementation? That one surely would have had some
> room for improvement, so the result here at least shouldn't be
> worse than that.
The final C version (including failsafe, and some error handling the asm
functions didn't have) is a bit less than twice the size of the asm
functions in terms of absolute size.
I haven't done any performance analysis, but I trust the compiler to
make better code overall (there are definitely pipeline stalls in the
asm versions), and wouldn't be surprised if it is faster to execute.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |