[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] mm: fix LLVM code-generation issue



On Fri, Nov 23, 2018 at 11:36:48AM +0000, Julien Grall wrote:
> 
> 
> On 23/11/2018 11:23, Roger Pau Monné wrote:
> > On Thu, Nov 22, 2018 at 05:46:19PM +0000, Julien Grall wrote:
> > > 
> > > 
> > > On 11/22/18 5:04 PM, George Dunlap wrote:
> > > > On 11/22/18 4:45 PM, Julien Grall wrote:
> > > > > Hi Roger,
> > > > > 
> > > > > On 11/22/18 4:39 PM, Roger Pau Monné wrote:
> > > > > > On Thu, Nov 22, 2018 at 04:22:34PM +0000, Andrew Cooper wrote:
> > > > > > > On 22/11/2018 16:07, Roger Pau Monné wrote:
> > > > > > > > On Thu, Nov 22, 2018 at 03:23:41PM +0000, Andrew Cooper wrote:
> > > > > > > > > On 22/11/2018 15:20, Roger Pau Monné wrote:
> > > > > > > > > > On Thu, Nov 22, 2018 at 02:03:55PM +0000, Julien Grall 
> > > > > > > > > > wrote:
> > > > > > > > > > > Hi Jan,
> > > > > > > > > > > 
> > > > > > > > > > > On 11/22/18 1:36 PM, Jan Beulich wrote:
> > > > > > > > > > > > > > > On 22.11.18 at 14:31, <andrew.cooper3@xxxxxxxxxx> 
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > I think Julien's point is that without explicitly 
> > > > > > > > > > > > > barriers, CPU0's
> > > > > > > > > > > > > update to system_state may not be visible on CPU1, 
> > > > > > > > > > > > > even though the
> > > > > > > > > > > > > mappings have been shot down.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Therefore, from the processors point of view, it did 
> > > > > > > > > > > > > everything
> > > > > > > > > > > > > correctly, and hit a real pagefault.
> > > > > > > > > > > > Boot time updates of system_state should be of no 
> > > > > > > > > > > > interest here,
> > > > > > > > > > > > as at that time the APs are all idling.
> > > > > > > > > > > That's probably true today. But this code looks rather 
> > > > > > > > > > > fragile as
> > > > > > > > > > > you don't
> > > > > > > > > > > know how this is going to be used in the future.
> > > > > > > > > > > 
> > > > > > > > > > > If you decide to gate init code with system_state, then 
> > > > > > > > > > > you need
> > > > > > > > > > > a barrier
> > > > > > > > > > > to ensure the code is future proof.
> > > > > > > > > > Wouldn't it be enough to declare system_state as volatile?
> > > > > > > > > No.  volatility (or lack thereof) is a compiler level 
> > > > > > > > > construct.
> > > > > > > > > 
> > > > > > > > > ARM has weaker cache coherency than x86, so a write which has
> > > > > > > > > completed
> > > > > > > > > on one CPU0 in the past may legitimately not be visible on 
> > > > > > > > > CPU1 yet.
> > > > > > > > > 
> > > > > > > > > If you need guarantees about the visibility of updated, you 
> > > > > > > > > must use
> > > > > > > > > appropriate barriers.
> > > > > > > > Right. There's some differences between ARM and x86, ARM sets
> > > > > > > > SYS_STATE_active and continues to make use of init functions. 
> > > > > > > > In any
> > > > > > > > case I have the following diff:
> > > > > > > > 
> > > > > > > > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> > > > > > > > index e83221ab79..cf50d05620 100644
> > > > > > > > --- a/xen/arch/arm/setup.c
> > > > > > > > +++ b/xen/arch/arm/setup.c
> > > > > > > > @@ -966,6 +966,7 @@ void __init start_xen(unsigned long
> > > > > > > > boot_phys_offset,
> > > > > > > >         serial_endboot();
> > > > > > > >           system_state = SYS_STATE_active;
> > > > > > > > +    smp_wmb();
> > > > > > > >           create_domUs();
> > > > > > > >     diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> > > > > > > > index 9cbff22fb3..41044c0b6f 100644
> > > > > > > > --- a/xen/arch/x86/setup.c
> > > > > > > > +++ b/xen/arch/x86/setup.c
> > > > > > > > @@ -593,6 +593,7 @@ static void noinline init_done(void)
> > > > > > > >         unsigned long start, end;
> > > > > > > >           system_state = SYS_STATE_active;
> > > > > > > > +    smp_wmb();
> > > > > > > >           domain_unpause_by_systemcontroller(dom0);
> > > > > > > 
> > > > > > > I'm afraid that that won't do anything to help at all.
> > > > > > > 
> > > > > > > smp_{wmb,rmb}() must be in matched pairs, and mb() must be 
> > > > > > > matched with
> > > > > > > itself.
> > > > > > 
> > > > > > Then I'm not sure about whether our previous plan still stands, are 
> > > > > > we
> > > > > > OK with using ACCESS_ONCE here and forgetting about the memory
> > > > > > barriers given the current usage?
> > > > > 
> > > > > The problem is not the current usage but how it could be used. 
> > > > > Debugging
> > > > > memory ordering is quite a pain so I would prefer this to be fixed
> > > > > correctly.
> > > > 
> > > > But in this case it wouldn't be a pain, because the only possible
> > > > failure mode is if the processor faults trying to read opt_bootscrub, 
> > > > right?
> > > 
> > > Possibly. But I don't see any reason to defer the fix until someone comes 
> > > up
> > > with unreliable crash.
> > 
> > If we have to go down that route, shouldn't we also protect
> > system_state with a lock so that it cannot be modified by a CPU while
> > it's being read from another?
> 
> The locking might be a bit too much. Modifying the system_state should not
> be an issue if you put the correct barrier in place.

What about the following scenario?

BSP               write;wmb();    remove init mapping;
AP   rmb();read                                          read init var;
    -----------time------>

Yes matching barriers are in place, but the result is still wrong. Can
this happen?  Even if we make opt_bootscrub non-init to avoid the fault,
we just defer the error to a later point.

This isn't really about coherency. Maybe we should put reading state
under heap lock?

Wei.

> 
> Cheers,
> 
> -- 
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.