[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed




> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Friday, May 27, 2016 9:43 PM
> To: Wu, Feng <feng.wu@xxxxxxxxx>
> Cc: andrew.cooper3@xxxxxxxxxx; dario.faggioli@xxxxxxxxxx;
> george.dunlap@xxxxxxxxxxxxx; Tian, Kevin <kevin.tian@xxxxxxxxx>; xen-
> devel@xxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx; keir@xxxxxxx
> Subject: Re: [PATCH v2 1/4] VMX: Properly handle pi when all the assigned
> devices are removed
> 
> >>> On 26.05.16 at 15:39, <feng.wu@xxxxxxxxx> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -113,7 +113,19 @@ static void vmx_vcpu_block(struct vcpu *v)
> >             &per_cpu(vmx_pi_blocking, v->processor).lock;
> >      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> >
> > -    spin_lock_irqsave(pi_blocking_list_lock, flags);
> > +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up) )
> > +    {
> > +        /*
> > +         * The vcpu is to be destroyed and it has already been removed
> > +         * from the per-CPU list if it is blocking, we shouldn't add
> > +         * new vCPU to the list.
> > +         */
> 
> This comment appears to be stale wrt the implementation further
> down. But see there for whether it's the comment or the code
> that need to change.
> 
> > +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +        return;
> > +    }
> > +
> > +    spin_lock(pi_blocking_list_lock);
> 
> There doesn't appear to be an active problem with this, but
> taking a global lock inside a per-vCPU one feels bad. Both here
> and in vmx_pi_blocking_cleanup() I can't see why the global
> lock alone won't do - you acquire it anyway.

The purpose of ' v->arch.hvm_vmx.pi_hotplug_lock ' is to make
sure things go right when vmx_pi_blocking_cleanup() and
vmx_vcpu_block() are running concurrently. However, the lock
'pi_blocking_list_lock' in vmx_pi_blocking_cleanup() refers to
' v->arch.hvm_vmx.pi_blocking.lock ' while 'pi_blocking_list_lock'
in vmx_vcpu_block() is '&per_cpu(vmx_pi_blocking, v->processor).lock'.
These two can be different, please consider the following scenario:

vmx_pi_blocking_cleanup() gets called when the vcpu is in blocking
state and 'v->arch.hvm_vmx.pi_blocking.lock' points to the per-cpu
lock of pCPU0, and when acquiring the lock in this function, another
one wins, (such as vmx_pi_do_resume() or pi_wakeup_interrupt()),
so we are spinning in vmx_pi_blocking_cleanup(). After the vCPU
is woken up, it is running on pCPU1, then sometime later, the vCPU
is blocking again and in vmx_vcpu_block() and if the previous
vmx_pi_blocking_cleanup() still doesn't finish its job. Then we
acquire a different lock (it is pCPU1 this time)in vmx_vcpu_block()
compared to the one in vmx_pi_blocking_cleanup. This can break
things, since we might still put the vCPU to the per-cpu blocking list
after vCPU is destroyed or the last assigned device is detached.

> 
> > +static void vmx_pi_blocking_cleanup(struct vcpu *v)
> > +{
> > +    unsigned long flags;
> > +    spinlock_t *pi_blocking_list_lock;
> > +
> > +    if ( !iommu_intpost )
> > +        return;
> 
> If the function is to remain to be called from just the body of a loop
> over all vCPU-s, please make that loop conditional upon iommu_intpost
> instead of checking it here (and bailing) for every vCPU.
> 
> > +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +    v->arch.hvm_vmx.pi_blocking_cleaned_up = 1;
> > +
> > +    pi_blocking_list_lock = v->arch.hvm_vmx.pi_blocking.lock;
> > +    if (pi_blocking_list_lock == NULL)
> 
> Coding style.
> 
> > +    {
> > +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +        return;
> > +    }
> > +
> > +    spin_lock(pi_blocking_list_lock);
> > +    if ( v->arch.hvm_vmx.pi_blocking.lock != NULL )
> > +    {
> > +        ASSERT(v->arch.hvm_vmx.pi_blocking.lock == pi_blocking_list_lock);
> > +        list_del(&v->arch.hvm_vmx.pi_blocking.list);
> > +        v->arch.hvm_vmx.pi_blocking.lock = NULL;
> > +    }
> > +    spin_unlock(pi_blocking_list_lock);
> > +    spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
> > +}
> > +
> >  /* This function is called when pcidevs_lock is held */
> >  void vmx_pi_hooks_assign(struct domain *d)
> >  {
> > +    struct vcpu *v;
> > +
> >      if ( !iommu_intpost || !has_hvm_container_domain(d) )
> >          return;
> >
> > -    ASSERT(!d->arch.hvm_domain.vmx.vcpu_block);
> > +    for_each_vcpu ( d, v )
> > +        v->arch.hvm_vmx.pi_blocking_cleaned_up = 0;
> >
> > -    d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
> > -    d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
> > -    d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
> > -    d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
> > +    if ( !d->arch.hvm_domain.vmx.vcpu_block )
> > +    {
> > +        d->arch.hvm_domain.vmx.vcpu_block = vmx_vcpu_block;
> > +        d->arch.hvm_domain.vmx.pi_switch_from = vmx_pi_switch_from;
> > +        d->arch.hvm_domain.vmx.pi_switch_to = vmx_pi_switch_to;
> > +        d->arch.hvm_domain.vmx.pi_do_resume = vmx_pi_do_resume;
> > +    }
> >  }
> >
> >  /* This function is called when pcidevs_lock is held */
> >  void vmx_pi_hooks_deassign(struct domain *d)
> >  {
> > +    struct vcpu *v;
> > +
> >      if ( !iommu_intpost || !has_hvm_container_domain(d) )
> >          return;
> >
> > -    ASSERT(d->arch.hvm_domain.vmx.vcpu_block);
> > -
> > -    d->arch.hvm_domain.vmx.vcpu_block = NULL;
> > -    d->arch.hvm_domain.vmx.pi_switch_from = NULL;
> > -    d->arch.hvm_domain.vmx.pi_switch_to = NULL;
> > -    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
> > +    for_each_vcpu ( d, v )
> > +        vmx_pi_blocking_cleanup(v);
> 
> If you keep the hooks in place, why is it relevant to do the cleanup
> here instead of just at domain death? As suggested before, if you
> did it there, you'd likely get away without adding the new per-vCPU
> flag.

I don't quite understand this. Adding the cleanup here is to handle
the corner case when the last assigned device is detached from the
domain. Why do you think we don't need to per-vCPU flag, we need
to set it here to indicate that the vCPU is cleaned up, and in
vmx_vcpu_block(), we cannot put the vCPUs with this flag set to the
per-cpu blocking list. Do I miss something?

> 
> Furthermore, if things remain the way they are now, a per-domain
> flag would suffice.

vmx_pi_blocking_cleanup() is called in vmx_cpu_destroy(), which can
be called during domain destroy or vcpu_initialise() if it meets some
errors. For the latter case, if we use per-domain flag and set it in
vmx_pi_blocking_clean(), that should be problematic, since it will
affect another vcpus which should be running normally.

> 
> And finally - do you really need to retain all four hooks? Since if not,
> one that gets zapped here could be used in place of such a per-
> domain flag.

Can you elaborate a bit more about this? thanks a lot!

> 
> > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> > @@ -231,6 +231,9 @@ struct arch_vmx_struct {
> >       * pCPU and wakeup the related vCPU.
> >       */
> >      struct pi_blocking_vcpu pi_blocking;
> > +
> > +    spinlock_t            pi_hotplug_lock;
> 
> Being through all of the patch, I have a problem seeing the hotplug
> nature of this.

This is related to the assigned device hotplug.

Thanks,
Feng

> 
> > +    bool_t                pi_blocking_cleaned_up;
> 
> If this flag is to remain, it should move into its designated structure
> (right above your addition).
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.