[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/4] xen/x86: Drop erronious barriers



On 06/12/2016 20:32, Stefano Stabellini wrote:
> On Tue, 6 Dec 2016, Stefano Stabellini wrote:
>> On Tue, 6 Dec 2016, Andrew Cooper wrote:
>>> On 05/12/2016 19:17, Stefano Stabellini wrote:
>>>> On Mon, 5 Dec 2016, Andrew Cooper wrote:
>>>>> None of these barriers serve any purpose, as they are not synchronising 
>>>>> with
>>>>> any anything on remote CPUs.
>>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>> ---
>>>>> CC: Jan Beulich <JBeulich@xxxxxxxx>
>>>>> CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
>>>>> CC: Julien Grall <julien.grall@xxxxxxx>
>>>>>
>>>>> Restricting to just the $ARCH maintainers, as this is a project-wide 
>>>>> sweep.
>>>>>
>>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier 
>>>>> usage
>>>>> from x86, but I don't know whether further development has gained a 
>>>>> dependence
>>>>> on them.
>>>> We turned them into smp_wmb already (kudos to IanC).
>>> Right, but the entire point I am trying to argue is that they are not
>>> needed in the first place.
> Just to be clear, on ARM the barriers are unneeded only if it is
> unimportant that "init stuff" (which correspond to all the
> initialization done in start_secondary up to smp_wmb) below is completed
> before "write cpu_online_map". But it looks like we do want to complete
> mmu, irq, timer initializations and set the current vcpu before marking
> the cpu as online, right?

No.  I am sorry, but this question suggests that you still don't appear
to understand barriers.

>
>
>> This is the current code:
>>
>>     CPU 1                                  CPU 0
>>     -----                                  -----
>>
>>     init stuff                             read cpu_online_map
>>
>>     write barrier                          
>>
>>     write cpu_online_map                   do more initialization
>>
>>     write barrier
>>
>>     init more stuff
>>
>>
>> I agree that it's wrong, because the second write barrier in
>> start_secondary is useless and in addition we are missing a read barrier
>> in __cpu_up, corresponding to the first write barrier in
>> start_secondary.
>>
>> I think it should look like:
>>
>>
>>     CPU 1                                  CPU 0
>>     -----                                  -----
>>
>>     init stuff                             read cpu_online_map
>>
>>     write barrier                          read barrier 
>>
>>     write cpu_online_map                   do more initialization
>>
>>     init more stuff

The barriers here serve no purpose, because you have missed an important
blocking while loop on CPU 0.

Recall, that the read/write barrier example is:

foo = a;
smp_rmb();
bar = b;

and

a = baz;
smp_wmb();
b = fromble;

This is specifically relevant *only* to the shared variables a and b,
where for correctness an update to a must be observed remotely before
the update to b.

If you do not have the explicitly same a and b on either side of the
smp_rmb/smp_wmb, then your code is wrong (and most likely, you shouldn't
be using the barriers).


The init sequence is a different scenario.

Processor 0 spins waiting to observe an update to cpu_online_map.

Processor 1 performs its init sequence, and mid way through, sets its
own bit in the cpu_online_map.  It then continues further init actions.

It does not matter whether processor 0 observes the update to
cpu_online_map slightly early or late compared to the local-state
updates from the other parts of processor 1's init sequence (because
processor 0 had better not be looking at the local-state changes).  Once
the bit is set in cpu_online_map, processor 1 is ready to be IPI'd to
give it work to do, etc.  Even if processor 0 hasn't observed all of the
local-state updates of processor 1, once it has seen the cpu_online_map
update, it knows that processor 1 is available to be IPI'd, and IPIing
processor 1 will cause execution with expected program order being
observed (from the point of view of processor 1).


The only consideration is whether processor 1 can make an action which
will prioritise the update to cpu_online_map becoming visible to the
rest of the system.  If there is such an action available, then an
argument can be made that making the update visible more quickly will
allow processor 0 to continue earlier.  However, such an action would be
entirely local to processor 1 and not related to anything happening on
processor 0.

I am not aware of any such action on x86, and my gut feeling is that no
other architecture would have one either.  The ability to fast forward
one specific update to the shared cache-coherency layer would only
complicate an already complicated area of silicon, and would only be
useful to micro-optimise a corner case; I can't see any system designers
considering it a useful feature to add.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.