[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-ia64-devel] PATCH: slightly improve stability
Excellent! I agree you have found a very difficult bug! I am now up to 167 linux compiles with no segfaults! Congratulations Anthony! One minor suggestion: I think the new added store can be in the same cycle as the previous load (no stop bit needed). I didn't look at the bundling... perhaps it doesn't matter. Dan > -----Original Message----- > From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] > Sent: Saturday, April 29, 2006 11:44 PM > To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold; > xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux > Kernel Dev) > Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > > >From: Magenheimer, Dan > >Sent: 2006å4æ29æ 21:58 > >To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; > >Williamson, Alex (Linux Kernel Dev) > >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > > > >Hi Anthony -- > > > >With both Tristan's stability patch and your earlier patch, > >I have completed 103 linux compiles now with no segfaults > >yet. Did you see your segfault with Tristan's patch > >included? > > > >I'll continue running over the weekend with the bits I > >have but if I see a segfault I will add in the additional > >store in Xen entry (minstate.h) from your newer patch. > > > > --- a/linux-2.6-xen-sparse/arch/ia64/xen/xenminstate.h > Thu Apr 27 02:55:42 2006 > +++ b/linux-2.6-xen-sparse/arch/ia64/xen/xenminstate.h > Sat Apr 29 13:14:58 2006 > @@ -155,6 +155,8 @@ > ;; > \ > ld4 r30=[r8]; > \ > ;; > \ > + /* set XSI_INCOMPL_REGFR 0 */ > \ > + st4 [r8]=r0; > \ > cmp.eq p6,p7=r30,r0; > \ > ;; /* not sure if this stop bit is necessary */ > \ > (p6) adds r8=XSI_PRECOVER_IFS-XSI_INCOMPL_REGFR,r8; > > The additional store is necessary. > > In theory, after Guest executes "cover", incomplete frame > changes to complete > frame. So Guest should set INCOMPL to 0 just after "cover". > At least before guest > psr.ic and psr.i are turned on. > > Previously, only when Guest executes "rfi", INCOMPL is set to > 0. The window > between "cover" and "rfi" causes trouble in below scenario. > > 1. Application A calls system call. > > 2. In OS breaks handler entry, INCOMPL is 0. Due to its system call, > Linux kernel doesn't execute "cover". > > 3. Before returning to Application A, schedule happens, > Application B begins > to run. > > 4. A TLB miss happens on the context of B, this may make > INCOMPL 1, before > Returning to B, (that means "rfi" is not executed, and > INCOMPL is still 1) > schedule happens again. A resumes to run with INCOMPL 1 > (this is incorrect now). > > 5. As mentioned before, this is system call, "cover" is executed in > ia64_leave_kernel path. Because INCOMPL is 1, this > "cover" is not actually > executed, but this "cover" should be executed. > > 5. Thus application A's frame is destroyed. Issue appears. > > > I did catch this scenario. > > Thanks, > Anthony > > > >Dan > > > >> -----Original Message----- > >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] > >> Sent: Saturday, April 29, 2006 12:03 AM > >> To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold; > >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux > >> Kernel Dev) > >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > >> > >> Hi Dan, > >> > >> Yes, we also got a segmentation fault in 1 run out of 30. > >> > >> Could you please try this new patch? > >> > >> Thanks, > >> -Anthony > >> > >> >-----Original Message----- > >> >From: Magenheimer, Dan (HP Labs Fort Collins) > >> [mailto:dan.magenheimer@xxxxxx] > >> >Sent: 2006å4æ28æ 22:49 > >> >To: Xu, Anthony; Tristan Gingold; > xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; > >> >Williamson, Alex (Linux Kernel Dev) > >> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > >> > > >> >Hi Anthony -- > >> > > >> >I tried your patch overnight and still got a segmentation > >> >fault in 1 run out of 50. I didn't try Tristan's patch yet, > >> >so will try both at the same time next... perhaps there > >> >are two different problems that show up as the segmentation > >> >fault. > >> > > >> >Dan > >> > > >> >> -----Original Message----- > >> >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] > >> >> Sent: Thursday, April 27, 2006 9:19 PM > >> >> To: Xu, Anthony; Tristan Gingold; > >> >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs > >> >> Fort Collins); Williamson, Alex (Linux Kernel Dev) > >> >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > >> >> > >> >> Hi Tristan, > >> >> Could you please check whether this patch address RSE issue? > >> >> > >> >> Yes, Intel QA team is doing the test in the meantime. > >> >> > >> >> > >> >> Thanks, > >> >> -Anthony > >> >> > >> >> >-----Original Message----- > >> >> >From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx > >> >> >[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On > >> >> Behalf Of Xu, Anthony > >> >> >Sent: 2006?4?28? 9:48 > >> >> >To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; > >> >> Magenheimer, Dan (HP > >> >> >Labs Fort Collins); Alex Williamson > >> >> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > >> >> > > >> >> >>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx > >> >> >>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On > >> >> Behalf Of Tristan > >> >> >>Gingold > >> >> >>Sent: 2006?4?27? 23:14 > >> >> >>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan > >> >> (HP Labs Fort > >> >> >>Collins); Alex Williamson > >> >> >>Subject: [Xen-ia64-devel] PATCH: slightly improve stability > >> >> >> > >> >> >>Hi, > >> >> >> > >> >> >>as reported earlier, this patch seems to improve stability: > >> >> crashes are at > >> >> >>least more coherent and maybe less frequent. > >> >> >> > >> >> >>RSE handling seems to have a bug: crahes are now due to > >> >> either a bad value in > >> >> >>a stacked register or a use of an invalid stacked register > >> >> (although cfm > >> >> >>seems correct in gdb!) > >> >> > > >> >> >I'm looking at this too, > >> >> >Yes there is a bug about handle_lazy_cover. > >> >> > > >> >> >void ia64_do_page_fault (unsigned long address, unsigned > >> >> long isr, struct > >> >> >pt_regs *regs, unsigned long itir) > >> >> >{ > >> >> > unsigned long iip = regs->cr_iip, iha; > >> >> > // FIXME should validate address here > >> >> > unsigned long pteval; > >> >> > unsigned long is_data = !((isr >> > IA64_ISR_X_BIT) & 1UL); > >> >> > IA64FAULT fault; > >> >> > > >> >> > if ((isr & IA64_ISR_IR) && handle_lazy_cover(current, > >> >> isr, regs)) return; > >> >> > > >> >> >This code sequence is intended to handle following scenario. > >> >> > > >> >> >1. Guest executes br.ret, this may cause mandatory RSE load, > >> >> and this load may > >> >> >cause TLB miss. > >> >> >2. VMM gets control, but VMM can't handle this TLB miss > >> >> itself, then VMM injects > >> >> >TLB miss to Guest TLB miss handler, when VMM executing "rfi" > >> >> to jump to Guest > >> >> >TLB miss handler, this TLB miss happens again. > >> >> >3. At this time, interrupt_collection_enabled is 0, so > >> >> handle_lazy_cover > >> >> >executes "cover" on behalf of Guest, and return to Guest TLB > >> >> miss handler again, > >> >> >this time there is no TLB miss. > >> >> > > >> >> > > >> >> >Following code sequence is in ia64_leave_kernel path with > >> >> psr.ic and psr.i off. > >> >> >When br.ret.dptk.many b0 is executed, there may be a > >> >> mandatory load, thus > >> >> >There may be a tlb miss, according to above description > >> >> handle_lazy_cover > >> >> >executes "cover" on behalf of Guest and return to Guest, > >> >> this is no correct > >> >> >in this scenario. > >> >> > > >> >> >I didn't find an easy way to fix this bug. > >> >> > > >> >> > > >> >> > mov loc6=0 > >> >> > mov loc7=0 > >> >> >(pRecurse) br.call.dptk.few b0=rse_clear_invalid > >> >> > ;; > >> >> > mov loc8=0 > >> >> > mov loc9=0 > >> >> > cmp.ne pReturn,p0=r0,in1 // if recursion count > >> >> != 0, we need to do a > >> >> >br.ret > >> >> > mov loc10=0 > >> >> > mov loc11=0 > >> >> >(pReturn) br.ret.dptk.many b0 > >> >> >#endif /* !CONFIG_ITANIUM */ > >> >> ># undef pRecurse > >> >> ># undef pReturn > >> >> > ;; > >> >> > alloc r17=ar.pfs,0,0,0,0 // drop current > register frame > >> >> > ;; > >> >> > loadrs > >> >> > > >> >> >Thanks, > >> >> >Anthony > >> >> > > >> >> > > >> >> >> > >> >> >>Tested by doing many linux kernel compilation in SMP > >> domU (> 100). > >> >> >> > >> >> >>Tristan. > >> >> > > >> >> >_______________________________________________ > >> >> >Xen-ia64-devel mailing list > >> >> >Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >> >> >http://lists.xensource.com/xen-ia64-devel > >> >> > >> > _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |