[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] xen mmu: fix a race window causing leave_mm BUG()



> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx]
> Sent: Wednesday, May 11, 2011 4:27 AM
> 
> On Fri, Apr 29, 2011 at 12:10:57PM +0800, Tian, Kevin wrote:
> >     xen mmu: fix a race window causing leave_mm BUG()
> 
> I've this in mailbox and I am wondering whether this still an issue with the
> 2.6.39 type kernels?
> How do you reproduce the failure? When using LVM?

this issue is reported by Xiaoyun when he did extensive test which happened
occasionally after dozen of hours running. From the phenomenon and info
provided by Xiaoyun, I found this potential race window and Xiaoyun has
verified this patch solving his stability issue.

the original thread is at:
http://lists.xensource.com/archives/html/xen-devel/2011-04/msg01186.html

his kernel is based on 2.6.38, and I checked latest 2.6.39 from your maintained
repo, and same issue still exists.

btw, I didn't reproduce it myself, and not sure whether Xiaoyun uses LVM. But
I think it has nothing to do with storage type, and a pure mmu design issue.

Thanks
Kevin

> >
> >     there's a race window in xen_drop_mm_ref, where remote cpu may exit
> >     dirty bitmap between the check on this cpu and the point where remote
> >     cpu handles drop request. So in drop_other_mm_ref we need check
> >     whether TLB state is still lazy before calling into leave_mm. This
> >     bug is rarely observed in earlier kernel, but exaggerated by the
> >     commit 831d52bc153971b70e64eccfbed2b232394f22f8 which clears
> bitmap
> >     after changing the TLB state.
> >
> >     thanks for Maxiaoyun<tinnycloud@xxxxxxxxxxx> to verify it.
> >
> >     Signed-off-by: Kevin Tian <kevin.tian@xxxxxxxxx>
> >
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index
> > 4e5a611..74c6e4a 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -1260,7 +1260,7 @@ static void drop_other_mm_ref(void *info)
> >
> >     active_mm = percpu_read(cpu_tlbstate.active_mm);
> >
> > -   if (active_mm == mm)
> > +   if (active_mm == mm && percpu_read(cpu_tlbstate.state) !=
> > +TLBSTATE_OK)
> >             leave_mm(smp_processor_id());
> >
> >     /* If this cpu still has a stale cr3 reference, then make sure
> 
> 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.