[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] mm: fix LLVM code-generation issue



On Thu, Nov 22, 2018 at 05:46:19PM +0000, Julien Grall wrote:
> 
> 
> On 11/22/18 5:04 PM, George Dunlap wrote:
> > On 11/22/18 4:45 PM, Julien Grall wrote:
> > > Hi Roger,
> > > 
> > > On 11/22/18 4:39 PM, Roger Pau Monné wrote:
> > > > On Thu, Nov 22, 2018 at 04:22:34PM +0000, Andrew Cooper wrote:
> > > > > On 22/11/2018 16:07, Roger Pau Monné wrote:
> > > > > > On Thu, Nov 22, 2018 at 03:23:41PM +0000, Andrew Cooper wrote:
> > > > > > > On 22/11/2018 15:20, Roger Pau Monné wrote:
> > > > > > > > On Thu, Nov 22, 2018 at 02:03:55PM +0000, Julien Grall wrote:
> > > > > > > > > Hi Jan,
> > > > > > > > > 
> > > > > > > > > On 11/22/18 1:36 PM, Jan Beulich wrote:
> > > > > > > > > > > > > On 22.11.18 at 14:31, <andrew.cooper3@xxxxxxxxxx> 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > I think Julien's point is that without explicitly 
> > > > > > > > > > > barriers, CPU0's
> > > > > > > > > > > update to system_state may not be visible on CPU1, even 
> > > > > > > > > > > though the
> > > > > > > > > > > mappings have been shot down.
> > > > > > > > > > > 
> > > > > > > > > > > Therefore, from the processors point of view, it did 
> > > > > > > > > > > everything
> > > > > > > > > > > correctly, and hit a real pagefault.
> > > > > > > > > > Boot time updates of system_state should be of no interest 
> > > > > > > > > > here,
> > > > > > > > > > as at that time the APs are all idling.
> > > > > > > > > That's probably true today. But this code looks rather 
> > > > > > > > > fragile as
> > > > > > > > > you don't
> > > > > > > > > know how this is going to be used in the future.
> > > > > > > > > 
> > > > > > > > > If you decide to gate init code with system_state, then you 
> > > > > > > > > need
> > > > > > > > > a barrier
> > > > > > > > > to ensure the code is future proof.
> > > > > > > > Wouldn't it be enough to declare system_state as volatile?
> > > > > > > No.  volatility (or lack thereof) is a compiler level construct.
> > > > > > > 
> > > > > > > ARM has weaker cache coherency than x86, so a write which has
> > > > > > > completed
> > > > > > > on one CPU0 in the past may legitimately not be visible on CPU1 
> > > > > > > yet.
> > > > > > > 
> > > > > > > If you need guarantees about the visibility of updated, you must 
> > > > > > > use
> > > > > > > appropriate barriers.
> > > > > > Right. There's some differences between ARM and x86, ARM sets
> > > > > > SYS_STATE_active and continues to make use of init functions. In any
> > > > > > case I have the following diff:
> > > > > > 
> > > > > > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> > > > > > index e83221ab79..cf50d05620 100644
> > > > > > --- a/xen/arch/arm/setup.c
> > > > > > +++ b/xen/arch/arm/setup.c
> > > > > > @@ -966,6 +966,7 @@ void __init start_xen(unsigned long
> > > > > > boot_phys_offset,
> > > > > >        serial_endboot();
> > > > > >          system_state = SYS_STATE_active;
> > > > > > +    smp_wmb();
> > > > > >          create_domUs();
> > > > > >    diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> > > > > > index 9cbff22fb3..41044c0b6f 100644
> > > > > > --- a/xen/arch/x86/setup.c
> > > > > > +++ b/xen/arch/x86/setup.c
> > > > > > @@ -593,6 +593,7 @@ static void noinline init_done(void)
> > > > > >        unsigned long start, end;
> > > > > >          system_state = SYS_STATE_active;
> > > > > > +    smp_wmb();
> > > > > >          domain_unpause_by_systemcontroller(dom0);
> > > > > 
> > > > > I'm afraid that that won't do anything to help at all.
> > > > > 
> > > > > smp_{wmb,rmb}() must be in matched pairs, and mb() must be matched 
> > > > > with
> > > > > itself.
> > > > 
> > > > Then I'm not sure about whether our previous plan still stands, are we
> > > > OK with using ACCESS_ONCE here and forgetting about the memory
> > > > barriers given the current usage?
> > > 
> > > The problem is not the current usage but how it could be used. Debugging
> > > memory ordering is quite a pain so I would prefer this to be fixed
> > > correctly.
> > 
> > But in this case it wouldn't be a pain, because the only possible
> > failure mode is if the processor faults trying to read opt_bootscrub, right?
> 
> Possibly. But I don't see any reason to defer the fix until someone comes up
> with unreliable crash.

If we have to go down that route, shouldn't we also protect
system_state with a lock so that it cannot be modified by a CPU while
it's being read from another?

Image an AP reads system_state < SYS_STATE_active, then BSP sets
system_state = SYS_STATE_active and clears the init mappings, then
when the AP tries to read an init variable it would get a page fault.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.