Xen project Mailing List

Re: [PATCH] xen/arm: Ensure the vCPU context is seen before clearing the _VPF_down

Hi, On 20/03/2021 11:47, Julien Grall wrote:



On 20/03/2021 00:01, Stefano Stabellini wrote:

On Sat, 27 Feb 2021, Julien Grall wrote:
(+ Dario and George)

Hi Stefano,
I have added Dario and George to get some inputs from the schedulingpart.
On 27/02/2021 01:58, Stefano Stabellini wrote:
On Fri, 26 Feb 2021, Julien Grall wrote:
From: Julien Grall <jgrall@xxxxxxxxxx>

A vCPU can get scheduled as soon as _VPF_down is cleared. As there is
currently not ordering guarantee in arch_set_info_guest(), it may be
possible that flag can be observed cleared before the new values ofvCPU
registers are observed.

Add an smp_mb() before the flag is cleared to prevent re-ordering.

Signed-off-by: Julien Grall <jgrall@xxxxxxxxxx>

---
Barriers should work in pair. However, I am not entirely surewhether to
put the other half. Maybe at the beginning of context_switch_to()?
It should be right after VGCF_online is set or cleared, right?
vcpu_guest_context_t is variable allocated on the heap just for thepurpose of
this call. So an ordering with VGFC_online is not going to do anything.
So it
would be:

xen/arch/arm/domctl.c:arch_get_info_guest
xen/arch/arm/vpsci.c:do_common_cpu_on

But I think it is impossible that either of them get called at the same
time as arch_set_info_guest, which makes me wonder if we actually need
the barrier...
arch_get_info_guest() is called without the domain lock held and Ican't see
any other lock that could prevent it to be called in // of
arch_set_info_guest().

So you could technically get corrupted information from
XEN_DOMCTL_getvcpucontext. For this case, we would want a smp_wmb()before
writing to v->is_initialised. The corresponding read barrier would be in
vcpu_pause() -> vcpu_sleep_sync() -> sync_vcpu_execstate().

But this is not the issue I was originally trying to solve. Currently,
do_common_cpu_on() will roughly do:

  1) domain_lock(d)

  2) v->arch.sctlr = ...
     v->arch.ttbr0 = ...

  3) clear_bit(_VFP_down, &v->pause_flags);

  4) domain_unlock(d)

  5) vcpu_wake(v);
If we had only one pCPU on the system, then we would only wake thevCPU instep 5. We would be fine in this situation. But that's not theinteresting
case.
If you add a second pCPU in the story, it may be possible to havevcpu_wake()happening in // (see more below). As there is no memory barrier, step3 may be
observed before 2. So, assuming the vcpu is runnable, we could start to
schedule a vCPU before any update to the registers (step 2) areobserved.
This means that when context_switch_to() is called, we may end up torestore
some old values.
Now the question is can vcpu_wake() be called in // from anotherpCPU? AFAICT,it would be only called if a given flag in v->pause_flags is cleared(e.g.
_VFP_blocked). But can we rely on that?
Even if we can rely on it, v->pause_flags has other flags in it. Icouldn'trule out that _VPF_down cannot be set at the same time as the other_VPF_*.
Therefore, I think a barrier is necessary to ensure the ordering.

Do you agree with this analysis?
Yes, I think this makes sense. The corresponding barrier in the
scheduling code would have to be after reading _VPF_down and before
reading v->arch.sctlr, etc.
The issues described here is also quite theoritical because there are
hundreds of instructions executed between the time a vCPU is seen
runnable and scheduled. But better be safe than sorry :).
---
   xen/arch/arm/domain.c | 7 +++++++
   1 file changed, 7 insertions(+)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index bdd3d3e5b5d5..2b705e66be81 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -914,7 +914,14 @@ int arch_set_info_guest(
       v->is_initialised = 1;
         if ( ctxt->flags & VGCF_online )
+    {
+        /*
+         * The vCPU can be scheduled as soon as _VPF_down is cleared.
+         * So clear the bit *after* the context was loaded.
+         */
+        smp_mb();
From the discussion above, I would move this barrier beforev->is_initialised.
So we also take care of the issue with arch_get_info_guest().
This barrier also can be reduced to a smp_wmb() as we only expect anordering
between writes.

The barrier would be paired with the barrier in:
    - sync_vcpu_execstate() in the case of arch_get_vcpu_info_guest().
- context_switch_to() in the case of scheduling (The exactbarrier is TDB).
OK, this makes sense, but why before:

   v->is_initialised = 1;

instead of right after it? It is just v->pause_flags we care about,
right?

The issue I originally tried to address was a race with v->pause_flags.But I also discovered one with v->initialised while answering to yourprevious e-mail. This was only briefly mentioned so let me expand it.

A toolstack can take a snapshot of the vCPU context usingXEN_DOMCTL_get_vcpucontext. The helper will bail out ifv->is_initialized is 0.

If v->is_initialized is 1, it will temporarily pause the vCPU and thencall arch_get_info_guest().

AFAICT, arch_get_info_guest() and arch_set_info_guest() (called fromPSCI CPU on) can run concurrently.

If you put the barrier right after v->is_initialised, then aprocessor/compiler is allowed to re-order the write with what comesbefore. Therefore, the new value of v->is_initialised may be observedbefore v->arch.{sctlr, ttbr0,...}.

Hence, we need a barrier before setting v->is_initialized so the newvalue is observed *after* the changes to v->arch.{sctlr, ttbr0, ...)have been observed.

A single smp_wmb() barrier before v->is_initialized should be sufficientto cover the two problems discussed as I don't think we need to observev->is_initialized *before* v->pause_flags.

Gentle ping. Does this makes sense to you? Cheers, -- Julien Grall

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.