[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 2/2] x86/monitor: Add new monitor event to catch all vmexits



On Thu, Apr 7, 2022 at 2:01 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 06.04.2022 20:02, Tamas K Lengyel wrote:
> > On Wed, Apr 6, 2022 at 11:14 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
> >> On 06.04.2022 16:58, Tamas K Lengyel wrote:
> >>> --- a/xen/arch/x86/hvm/vmx/vmx.c
> >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >>> @@ -4008,6 +4008,18 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
> >>>          }
> >>>      }
> >>>
> >>> +    if ( unlikely(currd->arch.monitor.vmexit_enabled) )
> >>> +    {
> >>> +        int rc;
> >>> +        __vmread(EXIT_QUALIFICATION, &exit_qualification);
> >>> +
> >>> +        rc = hvm_monitor_vmexit(exit_reason, exit_qualification);
> >>> +        if ( rc < 0 )
> >>> +            goto exit_and_crash;
> >>> +        else if ( rc )
> >>> +            return;
> >>> +    }
> >>> +
> >>>      /* XXX: This looks ugly, but we need a mechanism to ensure
> >>>       * any pending vmresume has really happened
> >>>       */
> >>
> >> A few lines down from here failed VM entry is being handled? Wouldn't
> >> you want to place your code after that? And wouldn't you want to avoid
> >> invoking the monitor for e.g. EXIT_REASON_EXTERNAL_INTERRUPT,
> >> EXIT_REASON_MCE_DURING_VMENTRY, and at least the NMI sub-case of
> >> EXIT_REASON_EXCEPTION_NMI?
> >
> > No, the placement is necessary to be where it is to be able to collect
> > information about all vmexits, no matter the root cause. Failed
> > vmentry & mce during vmentry would be interesting exits to see if we
> > can induce during fuzzing from within the guest (indicating some
> > serious state corruption) while the external interrupt and nmi cases
> > are for the most part just ignored and the fuzz-case restarted, but
> > could be still interesting to collect their frequencies. So in effect,
> > we want to avoid Xen hiding any of the events from the monitoring
> > agent by handling it one way or another and just let the agent decide
> > what to do next. We most certainly want to avoid Xen crashing the VM
> > for us.
>
> Okay, I can accept this reasoning. But then don't you need to move
> your code _up_, ahead of an earlier "return" (i.e. immediately after
> IRQ enabling)?

I was considering that but that crash+return condition I don't think
is possible to actually reach. It's on a path when the CPU doesn't
have #VE support and the EPT_POINTER is not found in the altp2m list.
There is no possible way for that condition to occur AFAICT. When no
#VE is available and the guest is allowed to use VMFUNC (ie not
altp2m-external-only mode) it gets emulated and the EPT_POINTER is set
by p2m_switch_vcpu_altp2m_by_id which sanity checks that the idx being
switched to is found in the altp2m list. So the guest can't switch
itself to an invalid EPT_POINTER. When the toolstack switches altp2m
it does the same validation after pausing the domain. Again, no way to
switch to an invalid altp2m. When an altp2m view is getting destroyed
by the toolstack it errors out if there are any vCPUs still using that
view, so no way for the vCPU to be stuck with a stale altp2m view
that's now invalid.

> Also may I ask that you transform your reasoning into
> some form of a code comment?

Sure, I can do that.

Tamas



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.