[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()



On Thu, 25 Feb 2021, Julien Grall wrote:
> On 23/02/2021 13:24, Ash Wilding wrote:
> > Hi Julien,
> 
> Hi Ash,
> 
> > Thanks for looking at this,
> > 
> > > vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the
> > > CPU to read any information about local events before the flag
> > > _VPF_blocked is set.
> > 
> > Reviewed-by: Ash Wilding <ash.j.wilding@xxxxxxxxx>
> 
> Thanks!
> 
> > 
> > 
> > As an aside,
> > 
> > > I couldn't convince myself whether the Arm implementation of
> > > local_events_need_delivery() contains enough barrier to prevent the
> > > re-ordering. However, I don't think we want to play with devil here
> > > as the function may be optimized in the future.
> > 
> > Agreed.
> > 
> > The vgic_vcpu_pending_irq() and vgic_evtchn_irq_pending() in the call
> > path of local_events_need_delivery() both call spin_lock_irqsave(),
> > which has an arch_lock_acquire_barrier() in its call path.
> > 
> > That just happens to map to a heavier smp_mb() on Arm right now, but
> > relying on this behaviour would be shaky; I can imagine a future update
> > to arch_lock_acquire_barrier() that relaxes it down to just acquire
> > semantics like its name implies (for example an LSE-based lock_acquire()
> > using LDUMAXA),in which case any code incorrectly relying on that full
> > barrier behaviour may break. I'm guessing this is what you meant by the
> > function may be optimized in future?
> 
> That's one of the optimization I had in mind. The other one is we may find a
> way to remove the spinlocks, so the barriers would disappear completely.
> 
> > 
> > Do we know whether there is an expectation for previous loads/stores
> > to have been observed before local_events_need_delivery()? I'm wondering
> > whether it would make sense to have an smb_mb() at the start of the
> > *_nomask() variant in local_events_need_delivery()'s call path.
> 
> That's a good question :). For Arm, there are 4 users of
> local_events_need_delivery():
>   1) do_poll()
>   2) vcpu_block()
>   3) hypercall_preempt_check()
>   4) general_preempt_check()
> 
> 3 and 4 are used for breaking down long running operations. I guess we would
> want to have an accurate view of the pending events and therefore we would
> need a memory barrier to prevent the loads happening too early.
> 
> In this case, I think the smp_mb() would want to be part of the
> hypercall_preempt_check() and general_preempt_check().
> 
> Therefore, I think we want to avoid the extra barrier in
> local_events_need_delivery(). Instead, we should require the caller to take
> care of the ordering if needed.
> 
> This would have benefits any new architecture as the common code would already
> contain the appropriate barriers.
> 
> @Stefano, what do you think?

I am thinking the same way as you. Also because it is cleaner if it is
the one who writes that also takes care of any barriers/flushes needed.

In this case it is vcpu_block that is writing _VPF_blocked and knows
that it has to be seen before local_events_need_delivery(). It is easier
to keep track of it if the barrier is in vcpu_block together with the
set_bit call.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.