[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down



On Fri, 16 Apr 2021, Julien Grall wrote:
> Hi Stefano,
> 
> On 13/04/2021 23:43, Stefano Stabellini wrote:
> > On Sat, 20 Mar 2021, Julien Grall wrote:
> > > On 20/03/2021 00:01, Stefano Stabellini wrote:
> > > > On Sat, 27 Feb 2021, Julien Grall wrote:
> > > > > (+ Dario and George)
> > > > > 
> > > > > Hi Stefano,
> > > > > 
> > > > > I have added Dario and George to get some inputs from the scheduling
> > > > > part.
> > > > > 
> > > > > On 27/02/2021 01:58, Stefano Stabellini wrote:
> > > > > > On Fri, 26 Feb 2021, Julien Grall wrote:
> > > > > > > From: Julien Grall <jgrall@xxxxxxxxxx>
> > > > > > > 
> > > > > > > A vCPU can get scheduled as soon as _VPF_down is cleared. As there
> > > > > > > is
> > > > > > > currently not ordering guarantee in arch_set_info_guest(), it may
> > > > > > > be
> > > > > > > possible that flag can be observed cleared before the new values
> > > > > > > of
> > > > > > > vCPU
> > > > > > > registers are observed.
> > > > > > > 
> > > > > > > Add an smp_mb() before the flag is cleared to prevent re-ordering.
> > > > > > > 
> > > > > > > Signed-off-by: Julien Grall <jgrall@xxxxxxxxxx>
> > > > > > > 
> > > > > > > ---
> > > > > > > 
> > > > > > > Barriers should work in pair. However, I am not entirely sure
> > > > > > > whether
> > > > > > > to
> > > > > > > put the other half. Maybe at the beginning of context_switch_to()?
> > > > > > 
> > > > > > It should be right after VGCF_online is set or cleared, right?
> > > > > 
> > > > > vcpu_guest_context_t is variable allocated on the heap just for the
> > > > > purpose of
> > > > > this call. So an ordering with VGFC_online is not going to do
> > > > > anything.
> > > > > 
> > > > > > So it
> > > > > > would be:
> > > > > > 
> > > > > > xen/arch/arm/domctl.c:arch_get_info_guest
> > > > > > xen/arch/arm/vpsci.c:do_common_cpu_on
> > > > > > 
> > > > > > But I think it is impossible that either of them get called at the
> > > > > > same
> > > > > > time as arch_set_info_guest, which makes me wonder if we actually
> > > > > > need
> > > > > > the barrier...
> > > > > arch_get_info_guest() is called without the domain lock held and I
> > > > > can't
> > > > > see
> > > > > any other lock that could prevent it to be called in // of
> > > > > arch_set_info_guest().
> > > > > 
> > > > > So you could technically get corrupted information from
> > > > > XEN_DOMCTL_getvcpucontext. For this case, we would want a smp_wmb()
> > > > > before
> > > > > writing to v->is_initialised. The corresponding read barrier would be
> > > > > in
> > > > > vcpu_pause() -> vcpu_sleep_sync() -> sync_vcpu_execstate().
> > > > > 
> > > > > But this is not the issue I was originally trying to solve. Currently,
> > > > > do_common_cpu_on() will roughly do:
> > > > > 
> > > > >    1) domain_lock(d)
> > > > > 
> > > > >    2) v->arch.sctlr = ...
> > > > >       v->arch.ttbr0 = ...
> > > > > 
> > > > >    3) clear_bit(_VFP_down, &v->pause_flags);
> > > > > 
> > > > >    4) domain_unlock(d)
> > > > > 
> > > > >    5) vcpu_wake(v);
> > > > > 
> > > > > If we had only one pCPU on the system, then we would only wake the
> > > > > vCPU in
> > > > > step 5. We would be fine in this situation. But that's not the
> > > > > interesting
> > > > > case.
> > > > > 
> > > > > If you add a second pCPU in the story, it may be possible to have
> > > > > vcpu_wake()
> > > > > happening in // (see more below). As there is no memory barrier, step
> > > > > 3
> > > > > may be
> > > > > observed before 2. So, assuming the vcpu is runnable, we could start
> > > > > to
> > > > > schedule a vCPU before any update to the registers (step 2) are
> > > > > observed.
> > > > > 
> > > > > This means that when context_switch_to() is called, we may end up to
> > > > > restore
> > > > > some old values.
> > > > > 
> > > > > Now the question is can vcpu_wake() be called in // from another pCPU?
> > > > > AFAICT,
> > > > > it would be only called if a given flag in v->pause_flags is cleared
> > > > > (e.g.
> > > > > _VFP_blocked). But can we rely on that?
> > > > > 
> > > > > Even if we can rely on it, v->pause_flags has other flags in it. I
> > > > > couldn't
> > > > > rule out that _VPF_down cannot be set at the same time as the other
> > > > > _VPF_*.
> > > > > 
> > > > > Therefore, I think a barrier is necessary to ensure the ordering.
> > > > > 
> > > > > Do you agree with this analysis?
> > > >    Yes, I think this makes sense. The corresponding barrier in the
> > > > scheduling code would have to be after reading _VPF_down and before
> > > > reading v->arch.sctlr, etc.
> > > > 
> > > > 
> > > > > > > The issues described here is also quite theoritical because there
> > > > > > > are
> > > > > > > hundreds of instructions executed between the time a vCPU is seen
> > > > > > > runnable and scheduled. But better be safe than sorry :).
> > > > > > > ---
> > > > > > >     xen/arch/arm/domain.c | 7 +++++++
> > > > > > >     1 file changed, 7 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > > > > > > index bdd3d3e5b5d5..2b705e66be81 100644
> > > > > > > --- a/xen/arch/arm/domain.c
> > > > > > > +++ b/xen/arch/arm/domain.c
> > > > > > > @@ -914,7 +914,14 @@ int arch_set_info_guest(
> > > > > > >         v->is_initialised = 1;
> > > > > > >           if ( ctxt->flags & VGCF_online )
> > > > > > > +    {
> > > > > > > +        /*
> > > > > > > +         * The vCPU can be scheduled as soon as _VPF_down is
> > > > > > > cleared.
> > > > > > > +         * So clear the bit *after* the context was loaded.
> > > > > > > +         */
> > > > > > > +        smp_mb();
> > > > > 
> > > > >   From the discussion above, I would move this barrier before
> > > > > v->is_initialised.
> > > > > So we also take care of the issue with arch_get_info_guest().
> > > > > 
> > > > > This barrier also can be reduced to a smp_wmb() as we only expect an
> > > > > ordering
> > > > > between writes.
> > > > > 
> > > > > The barrier would be paired with the barrier in:
> > > > >      - sync_vcpu_execstate() in the case of
> > > > > arch_get_vcpu_info_guest().
> > > > >      - context_switch_to() in the case of scheduling (The exact
> > > > > barrier is
> > > > > TDB).
> > > > 
> > > > OK, this makes sense, but why before:
> > > > 
> > > >     v->is_initialised = 1;
> > > > 
> > > > instead of right after it? It is just v->pause_flags we care about,
> > > > right?
> > > 
> > > The issue I originally tried to address was a race with v->pause_flags.
> > > But I
> > > also discovered one with v->initialised while answering to your previous
> > > e-mail. This was only briefly mentioned so let me expand it.
> > > 
> > > A toolstack can take a snapshot of the vCPU context using
> > > XEN_DOMCTL_get_vcpucontext. The helper will bail out if v->is_initialized
> > > is
> > > 0.
> > > 
> > > If v->is_initialized is 1, it will temporarily pause the vCPU and then
> > > call
> > > arch_get_info_guest().
> > > 
> > > AFAICT, arch_get_info_guest() and arch_set_info_guest() (called from PSCI
> > > CPU
> > > on) can run concurrently.
> > > 
> > > If you put the barrier right after v->is_initialised, then a
> > > processor/compiler is allowed to re-order the write with what comes
> > > before.
> > > Therefore, the new value of v->is_initialised may be observed before
> > > v->arch.{sctlr, ttbr0,...}.
> > > 
> > > Hence, we need a barrier before setting v->is_initialized so the new value
> > > is
> > > observed *after* the changes to v->arch.{sctlr, ttbr0, ...) have been
> > > observed.
> > > 
> > > A single smp_wmb() barrier before v->is_initialized should be sufficient
> > > to
> > > cover the two problems discussed as I don't think we need to observe
> > > v->is_initialized *before* v->pause_flags.
> > 
> > I think your explanation is correct. However, don't we need a smp_rmb()
> > barrier after reading v->is_initialised in xen/common/domctl.c:do_domctl
> > ? That would be the barrier that pairs with smp_wmb in regards to
> > v->is_initialised.
> 
> There is already a smp_mb() in sync_vcpu_exec_state() which is called from
> vcpu_pause() -> vcpu_sleep_sync().

But that's too late, isn't? The v->is_initialized check is done before
the vcpu_pause() call. We might end up taking the wrong code path:

https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/domctl.c#L587
https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/domctl.c#L598



> I don't think we can ever remove the memory barrier in sync_vcpu_exec_state()
> because the vCPU paused may have run (or initialized) on a different pCPU. So
> I would like to rely on the barrier rather than adding an extra one (even
> thought it is not a fast path).
> 
> I am thinking to add a comment on top of vcpu_pause() to clarify that after
> the call, the vCPU context will be observed without extra synchronization
> required.

Given that an if.. goto is involved, even if both code paths lead to
good results, I think it would be best to have the smp_rmb() barrier
above after the first v->is_initialised read to make sure we take the
right path. Instead of having to make sure that even the "wrong" path
behaves correctly.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.