[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/4] x86/pv: Introduce pv_create_exception_frame()



On 09/05/17 16:58, Jan Beulich wrote:
>>>> On 08.05.17 at 17:48, <andrew.cooper3@xxxxxxxxxx> wrote:
>> +void pv_create_exception_frame(void)
>> +{
>> +    struct vcpu *curr = current;
>> +    struct trap_bounce *tb = &curr->arch.pv_vcpu.trap_bounce;
> const (twice)?
>
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +    const bool user_mode_frame = !guest_kernel_mode(curr, regs);
>> +    uint8_t *evt_mask = &vcpu_info(curr, evtchn_upcall_mask);
>> +    unsigned long rflags;
> Does this really need to be "long"?

The answer to several of these questions are "probably not, but that's
how load_segments() did it".

>
>> +    unsigned int bytes, missing;
>> +
>> +    ASSERT_NOT_IN_ATOMIC();
>> +
>> +    if ( unlikely(null_trap_bounce(curr, tb)) )
>> +    {
>> +        gprintk(XENLOG_ERR, "Fatal: Attempting to inject null trap 
>> bounce\n");
>> +        __domain_crash_synchronous();
> Why not domain_crash() followed by "return"?

Because the existing code uses synchronous crashes.

Looking again at the callsites of pv_create_exception_frame(), we
immediately jump back to {compat_,}test_all_events, which proceeds to
run softirqs again.

Therefore, domain_crash() and a return should work.  (I think?)

>
>> +    }
>> +
>> +    /* Fold the upcall mask and architectural IOPL into the guests rflags. 
>> */
>> +    rflags  = regs->rflags & ~(X86_EFLAGS_IF | X86_EFLAGS_IOPL);
>> +    rflags |= ((*evt_mask ? 0 : X86_EFLAGS_IF) |
>> +               (VM_ASSIST(curr->domain, architectural_iopl)
>> +                ? curr->arch.pv_vcpu.iopl : 0));
>> +
>> +    if ( is_pv_32bit_vcpu(curr) )
>> +    {
>> +        /* { [ERRCODE,] EIP, CS/MASK , EFLAGS, [ESP, SS] } */
>> +        unsigned int frame[6], *ptr = frame, ksp =
>> +            (user_mode_frame ? curr->arch.pv_vcpu.kernel_sp : regs->esp);
>> +
>> +        if ( tb->flags & TBF_EXCEPTION_ERRCODE )
>> +            *ptr++ = tb->error_code;
>> +
>> +        *ptr++ = regs->eip;
>> +        *ptr++ = regs->cs | (((unsigned int)*evt_mask) << 16);
> Do you really need the cast here?

Does it promote correctly if the top bit of the mask is set?

>  In no case is there a need for the
> parentheses around the cast expression.
>
>> +        *ptr++ = rflags;
>> +
>> +        if ( user_mode_frame )
>> +        {
>> +            *ptr++ = regs->esp;
>> +            *ptr++ = regs->ss;
>> +        }
>> +
>> +        /* Copy the constructed frame to the guest kernel stack. */
>> +        bytes = _p(ptr) - _p(frame);
>> +        ksp -= bytes;
>> +
>> +        if ( unlikely((missing = __copy_to_user(_p(ksp), frame, bytes)) != 
>> 0) )
> While I don't think we need to be really bothered, it's perhaps still
> worth noting in a comment that the wrapping behavior here is
> wrong (and slightly worse than the assembly original), due to
> (implicit) address arithmetic all being done with 64-bit operands.

Ah - At some point, I had a comment here explaining the lack of an
__access_ok() check, but it appears to have got lost in a rebase.  I
will try to reinstate it.

The wrapping behaviour around the 4GB => 0 boundary is undefined, and
different between Intel and AMD (as we discovered with XSA-186).  If we
passing the exception back to the guest we would need to swap #PF for
#SS (for Intel), or properly wrap around (for AMD).

Would it be ok just to comment this point and leave it as is?

>
>> +        {
>> +            gprintk(XENLOG_ERR, "Fatal: Fault while writing exception 
>> frame\n");
>> +            show_page_walk(ksp + missing);
>> +            __domain_crash_synchronous();
>> +        }
>> +
>> +        /* Rewrite our stack frame. */
>> +        regs->rip           = (uint32_t)tb->eip;
>> +        regs->cs            = tb->cs;
>> +        regs->eflags       &= ~(X86_EFLAGS_VM | X86_EFLAGS_RF |
>> +                                X86_EFLAGS_NT | X86_EFLAGS_TF);
> You write ->rip above and ->rsp below - preferably those would
> become ->eip and ->esp, but alternatively (for consistency) this
> may want switching to ->rflags.

Ah - these are deliberately 64bit values even in the 32bit path, so a
32bit guest with an unexpected 64bit code segment will be truncated back
into its own range.

I will comment this point, and switch to using rflags.

>
>> +        regs->rsp           = ksp;
>> +        if ( user_mode_frame )
>> +            regs->ss = curr->arch.pv_vcpu.kernel_ss;
>> +    }
>> +    else
>> +    {
>> +        /* { RCX, R11, [ERRCODE,] RIP, CS/MASK, RFLAGS, RSP, SS } */
>> +        unsigned long frame[7], *ptr = frame, ksp =
> I clearly count 8 elements in the comment.

:)

>
>> +            (user_mode_frame ? curr->arch.pv_vcpu.kernel_sp : regs->rsp) & 
>> ~0xf;
>> +
>> +        if ( user_mode_frame )
>> +            toggle_guest_mode(curr);
>> +
>> +        *ptr++ = regs->rcx;
>> +        *ptr++ = regs->r11;
>> +
>> +        if ( tb->flags & TBF_EXCEPTION_ERRCODE )
>> +            *ptr++ = tb->error_code;
>> +
>> +        *ptr++ = regs->rip;
>> +        *ptr++ = (user_mode_frame ? regs->cs : regs->cs & ~3) |
>> +            ((unsigned long)(*evt_mask) << 32);
> Stray parentheses again.
>
>> +        *ptr++ = rflags;
>> +        *ptr++ = regs->rsp;
>> +        *ptr++ = regs->ss;
>> +
>> +        /* Copy the constructed frame to the guest kernel stack. */
>> +        bytes = _p(ptr) - _p(frame);
>> +        ksp -= bytes;
>> +
>> +        if ( unlikely(!__addr_ok(ksp)) )
>> +        {
>> +            gprintk(XENLOG_ERR, "Fatal: Bad guest kernel stack %p\n", 
>> _p(ksp));
>> +            __domain_crash_synchronous();
>> +        }
>> +        else if ( unlikely((missing =
>> +                            __copy_to_user(_p(ksp), frame, bytes)) != 0) )
>> +        {
>> +            gprintk(XENLOG_ERR, "Fatal: Fault while writing exception 
>> frame\n");
>> +            show_page_walk(ksp + missing);
>> +            __domain_crash_synchronous();
>> +        }
>> +
>> +        /* Rewrite our stack frame. */
>> +        regs->entry_vector |= TRAP_syscall;
>> +        regs->rip           = tb->eip;
>> +        regs->cs            = FLAT_KERNEL_CS;
>> +        regs->rflags       &= ~(X86_EFLAGS_AC | X86_EFLAGS_VM | 
>> X86_EFLAGS_RF |
>> +                                X86_EFLAGS_NT | X86_EFLAGS_TF);
>> +        regs->rsp           = ksp;
>> +        regs->ss            = FLAT_KERNEL_SS;
>> +    }
>> +
>> +    /* Mask events if requested. */
>> +    if ( tb->flags & TBF_INTERRUPT )
>> +        *evt_mask = 1;
>> +
>> +    /*
>> +     * Clobber the injection information now it has been completed.  Buggy
>> +     * attempts to inject the same event twice will hit the 
>> null_trap_bounce()
>> +     * check above.
>> +     */
>> +    *tb = (struct trap_bounce){};
> Ah, so that prevents tb becoming a pointer to const. I wonder
> though whether, on a rather hot path, we really want to zap the
> entire structure here. As I can see the value in satisfying
> null_trap_bounce(), how about zapping just ->eip / ->cs on the
> split paths above?

This ends up being two 8-byte writes of zeroes into a cache-hot line; it
isn't by any means a slow part of this path, whereas the 16bit write to
clobber just %cs would be.

Irrespective of that, the following patch depends on this clobbering of
->flags.

> Overall, did you compare generated code with the current
> assembly implementation? That one surely would have had some
> room for improvement, so the result here at least shouldn't be
> worse than that.

The final C version (including failsafe, and some error handling the asm
functions didn't have) is a bit less than twice the size of the asm
functions in terms of absolute size.

I haven't done any performance analysis, but I trust the compiler to
make better code overall (there are definitely pipeline stalls in the
asm versions), and wouldn't be surprised if it is faster to execute.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.