|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] x86/svm: Write the correct %eip into the outgoing task
On Thu, Nov 21, 2019 at 10:15:51PM +0000, Andrew Cooper wrote:
> The TASK_SWITCH vmexit has fault semantics, and doesn't provide any NRIPs
> assistance with instruction length. As a result, any instruction-induced task
> switch has the outgoing task's %eip pointing at the instruction switch caused
^ that
> the switch, rather than after it.
>
> This causes explicit use of task gates to livelock (as when the task returns,
> it executes the task-switching instruction again), and any restartable task to
> become a nop after its first instantiation (the entry state points at the
> ret/iret instruction used to exit the task).
>
> 32bit Windows in particular is known to use task gates for NMI handling, and
> to use NMI IPIs.
>
> In the task switch handler, distinguish instruction-induced from
> interrupt/exception-induced task switches, and decode the instruction under
> %rip to calculate its length.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> ---
> CC: Jan Beulich <JBeulich@xxxxxxxx>
> CC: Wei Liu <wl@xxxxxxx>
> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> CC: Juergen Gross <jgross@xxxxxxxx>
>
> The implementation of svm_get_task_switch_insn_len() is bug-compatible with
> svm_get_insn_len() when it comes to conditional #GP'ing. I still haven't had
> time to address this more thoroughly.
>
> AMD does permit TASK_SWITCH not to be intercepted and, I'm informed does do
> the right thing when it comes to a TSS crossing a page boundary. However, it
> is not actually safe to leave task switches unintercepted. Any NPT or shadow
> page fault, even from logdirty/paging/etc will corrupt guest state in an
> unrecoverable manner.
> ---
> xen/arch/x86/hvm/svm/emulate.c | 55
> +++++++++++++++++++++++++++++++++++
> xen/arch/x86/hvm/svm/svm.c | 46 ++++++++++++++++++++++-------
> xen/include/asm-x86/hvm/svm/emulate.h | 1 +
> 3 files changed, 92 insertions(+), 10 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
> index 3e52592847..176c25f60d 100644
> --- a/xen/arch/x86/hvm/svm/emulate.c
> +++ b/xen/arch/x86/hvm/svm/emulate.c
> @@ -117,6 +117,61 @@ unsigned int svm_get_insn_len(struct vcpu *v, unsigned
> int instr_enc)
> }
>
> /*
> + * TASK_SWITCH vmexits never provide an instruction length. We must always
> + * decode under %rip to find the answer.
> + */
> +unsigned int svm_get_task_switch_insn_len(struct vcpu *v)
> +{
> + struct hvm_emulate_ctxt ctxt;
> + struct x86_emulate_state *state;
> + unsigned int emul_len, modrm_reg;
> +
> + ASSERT(v == current);
> + hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
> + hvm_emulate_init_per_insn(&ctxt, NULL, 0);
> + state = x86_decode_insn(&ctxt.ctxt, hvmemul_insn_fetch);
> + if ( IS_ERR_OR_NULL(state) )
Maybe crash the guest in this case? Not advancing the instruction
pointer in a software induced task switch will create a loop AFAICT?
> + return 0;
> +
> + emul_len = x86_insn_length(state, &ctxt.ctxt);
> +
> + /*
> + * Check for an instruction which can cause a task switch. Any far
> + * jmp/call/ret, any software interrupt/exception, and iret.
> + */
> + switch ( ctxt.ctxt.opcode )
> + {
> + case 0xff: /* Grp 5 */
> + /* call / jmp (far, absolute indirect) */
> + if ( x86_insn_modrm(state, NULL, &modrm_reg) != 3 ||
> + (modrm_reg != 3 && modrm_reg != 5) )
> + {
> + /* Wrong instruction. Throw #GP back for now. */
> + default:
> + hvm_inject_hw_exception(TRAP_gp_fault, 0);
> + emul_len = 0;
> + break;
> + }
> + /* Fallthrough */
> + case 0x62: /* bound */
> + case 0x9a: /* call (far, absolute) */
I'm slightly loss here, in the case of call or jmp for example, don't
you need the instruction pointer to point to the destination of the
call/jmp instead of the next instruction?
> + case 0xca: /* ret imm16 (far) */
> + case 0xcb: /* ret (far) */
> + case 0xcc: /* int3 */
> + case 0xcd: /* int imm8 */
> + case 0xce: /* into */
> + case 0xcf: /* iret */
> + case 0xea: /* jmp (far, absolute) */
> + case 0xf1: /* icebp */
> + break;
> + }
> +
> + x86_emulate_free_state(state);
> +
> + return emul_len;
> +}
> +
> +/*
> * Local variables:
> * mode: C
> * c-file-style: "BSD"
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 049b800e20..ba9c24a70c 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -2776,7 +2776,41 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>
> case VMEXIT_TASK_SWITCH: {
> enum hvm_task_switch_reason reason;
> - int32_t errcode = -1;
> + int32_t errcode = -1, insn_len = -1;
Plain int seem better for insn_len?
Also I'm not sure there's a reason that errcode uses int32_t, but
that's not introduced here anyway.
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |