[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/4] VMX: Properly handle pi when all the assigned devices are removed



>>> On 03.06.16 at 07:12, <feng.wu@xxxxxxxxx> wrote:
>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> Sent: Tuesday, May 31, 2016 7:52 PM
>> >>> On 31.05.16 at 12:22, <feng.wu@xxxxxxxxx> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> >> Sent: Friday, May 27, 2016 9:43 PM
>> >> >>> On 26.05.16 at 15:39, <feng.wu@xxxxxxxxx> wrote:
>> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> >> > @@ -113,7 +113,19 @@ static void vmx_vcpu_block(struct vcpu *v)
>> >> >                 &per_cpu(vmx_pi_blocking, v->processor).lock;
>> >> >      struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
>> >> >
>> >> > -    spin_lock_irqsave(pi_blocking_list_lock, flags);
>> >> > +    spin_lock_irqsave(&v->arch.hvm_vmx.pi_hotplug_lock, flags);
>> >> > +    if ( unlikely(v->arch.hvm_vmx.pi_blocking_cleaned_up) )
>> >> > +    {
>> >> > +        /*
>> >> > +         * The vcpu is to be destroyed and it has already been removed
>> >> > +         * from the per-CPU list if it is blocking, we shouldn't add
>> >> > +         * new vCPU to the list.
>> >> > +         */
>> >> > +        spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_hotplug_lock, 
>> >> > flags);
>> >> > +        return;
>> >> > +    }
>> >> > +
>> >> > +    spin_lock(pi_blocking_list_lock);
>> >>
>> >> There doesn't appear to be an active problem with this, but
>> >> taking a global lock inside a per-vCPU one feels bad. Both here
>> >> and in vmx_pi_blocking_cleanup() I can't see why the global
>> >> lock alone won't do - you acquire it anyway.
>> >
>> > The purpose of ' v->arch.hvm_vmx.pi_hotplug_lock ' is to make
>> > sure things go right when vmx_pi_blocking_cleanup() and
>> > vmx_vcpu_block() are running concurrently. However, the lock
>> > 'pi_blocking_list_lock' in vmx_pi_blocking_cleanup() refers to
>> > ' v->arch.hvm_vmx.pi_blocking.lock ' while 'pi_blocking_list_lock'
>> > in vmx_vcpu_block() is '&per_cpu(vmx_pi_blocking, v->processor).lock'.
>> > These two can be different, please consider the following scenario:
>> >
>> > vmx_pi_blocking_cleanup() gets called when the vcpu is in blocking
>> > state and 'v->arch.hvm_vmx.pi_blocking.lock' points to the per-cpu
>> > lock of pCPU0, and when acquiring the lock in this function, another
>> > one wins, (such as vmx_pi_do_resume() or pi_wakeup_interrupt()),
>> > so we are spinning in vmx_pi_blocking_cleanup(). After the vCPU
>> > is woken up, it is running on pCPU1, then sometime later, the vCPU
>> > is blocking again and in vmx_vcpu_block() and if the previous
>> > vmx_pi_blocking_cleanup() still doesn't finish its job. Then we
>> > acquire a different lock (it is pCPU1 this time)in vmx_vcpu_block()
>> > compared to the one in vmx_pi_blocking_cleanup. This can break
>> > things, since we might still put the vCPU to the per-cpu blocking list
>> > after vCPU is destroyed or the last assigned device is detached.
>> 
>> Makes sense. Then the next question is whether you really need
>> to hold both locks at the same time.
> 
> I think we need to hold both. The two spinlocks have different purpose:
> 'v->arch.hvm_vmx.pi_hotplug_lock' is to protect some race case while
> device hotplug or the domain is being down, while the other one is
> used to normally protect accesses to the blocking list.

I understand they have different purposes, but that doesn't mean
they need to be held at the same time. In particular (looking back
at the full patch), the newly added lock frames not only the newly
added code, but also the code inside the other locked region. The
natural thing to expect would be that you need one lock for that
new code, and the other for the pre-existing one. I.e. the function
would still acquire both locks, but not one inside the other.

>> Please understand that I'd
>> really prefer for this to have a simpler solution - the original code
>> was already more complicated than one would really think it should
>> be, and now it's getting worse. While I can't immediately suggest
>> alternatives, it feels like the solution to the current problem may
>> rather be simplification instead of making things even more
>> complicated.
> 
> To be honest, comments like this make one frustrated. If you have
> any other better solutions, we can discuss it here. However, just
> saying the current solution is too complicated is not a good way
> to speed up the process.

I can understand your frustration. But please understand that I'm
also getting frustrated - by more and more difficult to maintain code
getting added. Please understand that addressing the immediate
problem is only one aspect; another is the long term maintainability
of whatever gets added for that purpose. But see below.

It is a more general observation of mine (also, just fyi, in my own
code) that if complicated things need even more complicated fixes,
then quite often something is wrong with the original complicated
arrangements / design.

>> >> >  void vmx_pi_hooks_deassign(struct domain *d)
>> >> >  {
>> >> > +    struct vcpu *v;
>> >> > +
>> >> >      if ( !iommu_intpost || !has_hvm_container_domain(d) )
>> >> >          return;
>> >> >
>> >> > -    ASSERT(d->arch.hvm_domain.vmx.vcpu_block);
>> >> > -
>> >> > -    d->arch.hvm_domain.vmx.vcpu_block = NULL;
>> >> > -    d->arch.hvm_domain.vmx.pi_switch_from = NULL;
>> >> > -    d->arch.hvm_domain.vmx.pi_switch_to = NULL;
>> >> > -    d->arch.hvm_domain.vmx.pi_do_resume = NULL;
>> >> > +    for_each_vcpu ( d, v )
>> >> > +        vmx_pi_blocking_cleanup(v);
>> >>
>> >> If you keep the hooks in place, why is it relevant to do the cleanup
>> >> here instead of just at domain death? As suggested before, if you
>> >> did it there, you'd likely get away without adding the new per-vCPU
>> >> flag.
>> >
>> > I don't quite understand this. Adding the cleanup here is to handle
>> > the corner case when the last assigned device is detached from the
>> > domain.
>> 
>> There's nothing to be cleaned up really if that de-assign isn't a
>> result of the domain being brought down.
> 
> I mentioned it clearly in [0/4] of this series. We _need_ to cleanup
> the blocking list when removing the device while the domain
> is running, since vmx_vcpu_block() might be running at the same
> time. If that is the case, there might be some stale vcpus in the
> blocking list.

I have to admit that the description there is far from clear to me,
even more so in the specific regard being discussed here.

Anyway - if there are problems here, did you consider simply
broadcasting a PI wakeup interrupt after device removal (and
after having "manually" set the condition to satisfy the
pi_test_on() in pi_wakeup_interrupt())? It would seem to me that
this would leverage existing code without adding new complicated
logic. But of course I may be overlooking something.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.