[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 for-4.14] x86/monitor: revert default behavior when monitoring register write events



On Wed, Jun 3, 2020 at 2:28 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
>
> On Tue, Jun 02, 2020 at 07:49:09AM -0600, Tamas K Lengyel wrote:
> > For the last couple years we have received numerous reports from users of
> > monitor vm_events of spurious guest crashes when using events. In 
> > particular,
> > it has observed that the problem occurs when vm_events are being disabled. 
> > The
> > nature of the guest crash varied widely and has only occured occasionally. 
> > This
> > made debugging the issue particularly hard. We had discussions about this 
> > issue
> > even here on the xen-devel mailinglist with no luck figuring it out.
> >
> > The bug has now been identified as a race-condition between register event
> > handling and disabling the monitor vm_event interface.
> >
> > Patch 96760e2fba100d694300a81baddb5740e0f8c0ee, "vm_event: deny register 
> > writes
> > if refused by  vm_event reply" is the patch that introduced the error. In 
> > this
>
> FWIW, we use the 'Fixes:' tag in order to make it easier for
> maintainers of stable trees to know which bugfixes to pick. This
> should have:
>
> Fixes: 96760e2fba10 ('vm_event: deny register writes if refused by vm_event 
> reply')
>
> Before the SoB.
>
> > patch the default behavior regarding emulation of register write events is
> > changed so that they get postponed until the corresponding vm_event handler
> > decides whether to allow such write to take place. Unfortunately this can 
> > only
> > be implemented by performing the deny/allow step when the vCPU gets 
> > scheduled.
> > Due to that postponed emulation of the event if the user decides to pause 
> > the
> > VM in the vm_event handler and then disable events, the entire emulation 
> > step
> > is skipped the next time the vCPU is resumed. Even if the user doesn't pause
> > during the vm_event handling but exits immediately and disables vm_event, 
> > the
> > situation becomes racey as disabling vm_event may succeed before the guest's
> > vCPUs get scheduled with the pending emulation task. This has been 
> > particularly
> > the case with VMS that have several vCPUs as after the VM is unpaused it may
> > actually take a long time before all vCPUs get scheduled.
> >
> > In this patch we are reverting the default behavior to always perform 
> > emulation
> > of register write events when the event occurs. To postpone them can be 
> > turned
> > on as an option. In that case the user of the interface still has to take 
> > care
> > of only disabling the interface when its safe as it remains buggy.
> >
> > Signed-off-by: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>
>
> Thanks for taking care of this!
>
> > ---
> >  xen/arch/x86/hvm/hvm.c            | 14 ++++++++------
> >  xen/arch/x86/hvm/monitor.c        | 13 ++++++++-----
> >  xen/arch/x86/monitor.c            | 10 +++++++++-
> >  xen/include/asm-x86/domain.h      |  1 +
> >  xen/include/asm-x86/hvm/monitor.h |  7 +++----
> >  xen/include/public/domctl.h       |  1 +
> >  6 files changed, 30 insertions(+), 16 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index 74c9f84462..5bb47583b3 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -3601,13 +3601,15 @@ int hvm_msr_write_intercept(unsigned int msr, 
> > uint64_t msr_content,
> >
> >          ASSERT(v->arch.vm_event);
> >
> > -        /* The actual write will occur in hvm_do_resume() (if permitted). 
> > */
> > -        v->arch.vm_event->write_data.do_write.msr = 1;
> > -        v->arch.vm_event->write_data.msr = msr;
> > -        v->arch.vm_event->write_data.value = msr_content;
> > +        if ( hvm_monitor_msr(msr, msr_content, msr_old_content) )
> > +        {
> > +            /* The actual write will occur in hvm_do_resume(), if 
> > permitted. */
> > +            v->arch.vm_event->write_data.do_write.msr = 1;
> > +            v->arch.vm_event->write_data.msr = msr;
> > +            v->arch.vm_event->write_data.value = msr_content;
> >
> > -        hvm_monitor_msr(msr, msr_content, msr_old_content);
> > -        return X86EMUL_OKAY;
> > +            return X86EMUL_OKAY;
> > +        }
> >      }
> >
> >      if ( (ret = guest_wrmsr(v, msr, msr_content)) != X86EMUL_UNHANDLEABLE )
> > diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
> > index 8aa14137e2..36894b33a4 100644
> > --- a/xen/arch/x86/hvm/monitor.c
> > +++ b/xen/arch/x86/hvm/monitor.c
> > @@ -53,11 +53,11 @@ bool hvm_monitor_cr(unsigned int index, unsigned long 
> > value, unsigned long old)
> >              .u.write_ctrlreg.old_value = old
> >          };
> >
> > -        if ( monitor_traps(curr, sync, &req) >= 0 )
> > -            return 1;
> > +        return monitor_traps(curr, sync, &req) >= 0 &&
> > +            curr->domain->arch.monitor.control_register_values;
>
> Nit (it can be fixed while committing IMO): curr should be aligned to
> monitor.

This is the established style already in-place, see
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/hvm/monitor.c;h=8aa14137e25a47d3826843441e244decf81dc855;hb=refs/heads/staging#l76:

  76     return curr->domain->arch.monitor.emul_unimplemented_enabled &&
  77         monitor_traps(curr, true, &req) == 1;

I don't really care either way but at least we should be consistent.

>
> >      }
> >
> > -    return 0;
> > +    return false;
> >  }
> >
> >  bool hvm_monitor_emul_unimplemented(void)
> > @@ -77,7 +77,7 @@ bool hvm_monitor_emul_unimplemented(void)
> >          monitor_traps(curr, true, &req) == 1;
> >  }
> >
> > -void hvm_monitor_msr(unsigned int msr, uint64_t new_value, uint64_t 
> > old_value)
> > +bool hvm_monitor_msr(unsigned int msr, uint64_t new_value, uint64_t 
> > old_value)
> >  {
> >      struct vcpu *curr = current;
> >
> > @@ -92,8 +92,11 @@ void hvm_monitor_msr(unsigned int msr, uint64_t 
> > new_value, uint64_t old_value)
> >              .u.mov_to_msr.old_value = old_value
> >          };
> >
> > -        monitor_traps(curr, 1, &req);
> > +        return monitor_traps(curr, 1, &req) >= 0 &&
> > +            curr->domain->arch.monitor.control_register_values;
>
> Same here.
>
> >      }
> > +
> > +    return false;
> >  }
> >
> >  void hvm_monitor_descriptor_access(uint64_t exit_info,
> > diff --git a/xen/arch/x86/monitor.c b/xen/arch/x86/monitor.c
> > index bbcb7536c7..1517a97f50 100644
> > --- a/xen/arch/x86/monitor.c
> > +++ b/xen/arch/x86/monitor.c
> > @@ -144,7 +144,15 @@ int arch_monitor_domctl_event(struct domain *d,
> >                                struct xen_domctl_monitor_op *mop)
> >  {
> >      struct arch_domain *ad = &d->arch;
> > -    bool requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op);
> > +    bool requested_status;
> > +
> > +    if ( XEN_DOMCTL_MONITOR_OP_CONTROL_REGISTERS == mop->op )
> > +    {
> > +        ad->monitor.control_register_values = true;
> > +        return 0;
>
> I think this would be better implemented in arch_monitor_domctl_op
> which already handles other XEN_DOMCTL_MONITOR_OP_* options, and also
> skips the arch_monitor_domctl_event call?

Hm, yea, that would be better placement for it, you are right.

>
> > +    }
> > +
> > +    requested_status = (XEN_DOMCTL_MONITOR_OP_ENABLE == mop->op);
> >
> >      switch ( mop->event )
> >      {
> > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> > index e8cee3d5e5..6fd94c2e14 100644
> > --- a/xen/include/asm-x86/domain.h
> > +++ b/xen/include/asm-x86/domain.h
> > @@ -418,6 +418,7 @@ struct arch_domain
> >           * This is used to filter out pagefaults.
> >           */
> >          unsigned int inguest_pagefault_disabled                            
> > : 1;
> > +        unsigned int control_register_values                               
> > : 1;
> >          struct monitor_msr_bitmap *msr_bitmap;
> >          uint64_t write_ctrlreg_mask[4];
> >      } monitor;
> > diff --git a/xen/include/asm-x86/hvm/monitor.h 
> > b/xen/include/asm-x86/hvm/monitor.h
> > index 66de24cb75..a75cd8545c 100644
> > --- a/xen/include/asm-x86/hvm/monitor.h
> > +++ b/xen/include/asm-x86/hvm/monitor.h
> > @@ -29,15 +29,14 @@ enum hvm_monitor_debug_type
> >  };
> >
> >  /*
> > - * Called for current VCPU on crX/MSR changes by guest.
> > - * The event might not fire if the client has subscribed to it in 
> > onchangeonly
> > - * mode, hence the bool return type for control register write events.
> > + * Called for current VCPU on crX/MSR changes by guest. Bool return signals
> > + * whether emulation should be postponed.
> >   */
> >  bool hvm_monitor_cr(unsigned int index, unsigned long value,
> >                      unsigned long old);
> >  #define hvm_monitor_crX(cr, new, old) \
> >                          hvm_monitor_cr(VM_EVENT_X86_##cr, new, old)
> > -void hvm_monitor_msr(unsigned int msr, uint64_t value, uint64_t old_value);
> > +bool hvm_monitor_msr(unsigned int msr, uint64_t value, uint64_t old_value);
> >  void hvm_monitor_descriptor_access(uint64_t exit_info,
> >                                     uint64_t vmx_exit_qualification,
> >                                     uint8_t descriptor, bool is_write);
> > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> > index 1ad34c35eb..cbcd25f12c 100644
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -1025,6 +1025,7 @@ struct xen_domctl_psr_cmt_op {
> >  #define XEN_DOMCTL_MONITOR_OP_DISABLE           1
> >  #define XEN_DOMCTL_MONITOR_OP_GET_CAPABILITIES  2
> >  #define XEN_DOMCTL_MONITOR_OP_EMULATE_EACH_REP  3
> > +#define XEN_DOMCTL_MONITOR_OP_CONTROL_REGISTERS 4
>
> Could you please add a note that this is broken?

Sure.

Thanks,
Tamas



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.