|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] mm: fix LLVM code-generation issue
On Fri, Nov 23, 2018 at 11:36:48AM +0000, Julien Grall wrote:
>
>
> On 23/11/2018 11:23, Roger Pau Monné wrote:
> > On Thu, Nov 22, 2018 at 05:46:19PM +0000, Julien Grall wrote:
> > >
> > >
> > > On 11/22/18 5:04 PM, George Dunlap wrote:
> > > > On 11/22/18 4:45 PM, Julien Grall wrote:
> > > > > Hi Roger,
> > > > >
> > > > > On 11/22/18 4:39 PM, Roger Pau Monné wrote:
> > > > > > On Thu, Nov 22, 2018 at 04:22:34PM +0000, Andrew Cooper wrote:
> > > > > > > On 22/11/2018 16:07, Roger Pau Monné wrote:
> > > > > > > > On Thu, Nov 22, 2018 at 03:23:41PM +0000, Andrew Cooper wrote:
> > > > > > > > > On 22/11/2018 15:20, Roger Pau Monné wrote:
> > > > > > > > > > On Thu, Nov 22, 2018 at 02:03:55PM +0000, Julien Grall
> > > > > > > > > > wrote:
> > > > > > > > > > > Hi Jan,
> > > > > > > > > > >
> > > > > > > > > > > On 11/22/18 1:36 PM, Jan Beulich wrote:
> > > > > > > > > > > > > > > On 22.11.18 at 14:31, <andrew.cooper3@xxxxxxxxxx>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > I think Julien's point is that without explicitly
> > > > > > > > > > > > > barriers, CPU0's
> > > > > > > > > > > > > update to system_state may not be visible on CPU1,
> > > > > > > > > > > > > even though the
> > > > > > > > > > > > > mappings have been shot down.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Therefore, from the processors point of view, it did
> > > > > > > > > > > > > everything
> > > > > > > > > > > > > correctly, and hit a real pagefault.
> > > > > > > > > > > > Boot time updates of system_state should be of no
> > > > > > > > > > > > interest here,
> > > > > > > > > > > > as at that time the APs are all idling.
> > > > > > > > > > > That's probably true today. But this code looks rather
> > > > > > > > > > > fragile as
> > > > > > > > > > > you don't
> > > > > > > > > > > know how this is going to be used in the future.
> > > > > > > > > > >
> > > > > > > > > > > If you decide to gate init code with system_state, then
> > > > > > > > > > > you need
> > > > > > > > > > > a barrier
> > > > > > > > > > > to ensure the code is future proof.
> > > > > > > > > > Wouldn't it be enough to declare system_state as volatile?
> > > > > > > > > No. volatility (or lack thereof) is a compiler level
> > > > > > > > > construct.
> > > > > > > > >
> > > > > > > > > ARM has weaker cache coherency than x86, so a write which has
> > > > > > > > > completed
> > > > > > > > > on one CPU0 in the past may legitimately not be visible on
> > > > > > > > > CPU1 yet.
> > > > > > > > >
> > > > > > > > > If you need guarantees about the visibility of updated, you
> > > > > > > > > must use
> > > > > > > > > appropriate barriers.
> > > > > > > > Right. There's some differences between ARM and x86, ARM sets
> > > > > > > > SYS_STATE_active and continues to make use of init functions.
> > > > > > > > In any
> > > > > > > > case I have the following diff:
> > > > > > > >
> > > > > > > > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> > > > > > > > index e83221ab79..cf50d05620 100644
> > > > > > > > --- a/xen/arch/arm/setup.c
> > > > > > > > +++ b/xen/arch/arm/setup.c
> > > > > > > > @@ -966,6 +966,7 @@ void __init start_xen(unsigned long
> > > > > > > > boot_phys_offset,
> > > > > > > > serial_endboot();
> > > > > > > > system_state = SYS_STATE_active;
> > > > > > > > + smp_wmb();
> > > > > > > > create_domUs();
> > > > > > > > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> > > > > > > > index 9cbff22fb3..41044c0b6f 100644
> > > > > > > > --- a/xen/arch/x86/setup.c
> > > > > > > > +++ b/xen/arch/x86/setup.c
> > > > > > > > @@ -593,6 +593,7 @@ static void noinline init_done(void)
> > > > > > > > unsigned long start, end;
> > > > > > > > system_state = SYS_STATE_active;
> > > > > > > > + smp_wmb();
> > > > > > > > domain_unpause_by_systemcontroller(dom0);
> > > > > > >
> > > > > > > I'm afraid that that won't do anything to help at all.
> > > > > > >
> > > > > > > smp_{wmb,rmb}() must be in matched pairs, and mb() must be
> > > > > > > matched with
> > > > > > > itself.
> > > > > >
> > > > > > Then I'm not sure about whether our previous plan still stands, are
> > > > > > we
> > > > > > OK with using ACCESS_ONCE here and forgetting about the memory
> > > > > > barriers given the current usage?
> > > > >
> > > > > The problem is not the current usage but how it could be used.
> > > > > Debugging
> > > > > memory ordering is quite a pain so I would prefer this to be fixed
> > > > > correctly.
> > > >
> > > > But in this case it wouldn't be a pain, because the only possible
> > > > failure mode is if the processor faults trying to read opt_bootscrub,
> > > > right?
> > >
> > > Possibly. But I don't see any reason to defer the fix until someone comes
> > > up
> > > with unreliable crash.
> >
> > If we have to go down that route, shouldn't we also protect
> > system_state with a lock so that it cannot be modified by a CPU while
> > it's being read from another?
>
> The locking might be a bit too much. Modifying the system_state should not
> be an issue if you put the correct barrier in place.
What about the following scenario?
BSP write;wmb(); remove init mapping;
AP rmb();read read init var;
-----------time------>
Yes matching barriers are in place, but the result is still wrong. Can
this happen? Even if we make opt_bootscrub non-init to avoid the fault,
we just defer the error to a later point.
This isn't really about coherency. Maybe we should put reading state
under heap lock?
Wei.
>
> Cheers,
>
> --
> Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |