[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-ia64-devel] PATCH: slightly improve stability



Hi Tristan,
Could you please check whether this patch address RSE issue?

Yes, Intel QA team is doing the test in the meantime.


Thanks,
-Anthony 

>-----Original Message-----
>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Xu, Anthony
>Sent: 2006?4?28? 9:48
>To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP
>Labs Fort Collins); Alex Williamson
>Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>
>>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Tristan
>>Gingold
>>Sent: 2006?4?27? 23:14
>>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs Fort
>>Collins); Alex Williamson
>>Subject: [Xen-ia64-devel] PATCH: slightly improve stability
>>
>>Hi,
>>
>>as reported earlier, this patch seems to improve stability: crashes are at
>>least more coherent and maybe less frequent.
>>
>>RSE handling seems to have a bug: crahes are now due to either a bad value in
>>a stacked register or a use of an invalid stacked register (although cfm
>>seems correct in gdb!)
>
>I'm looking at this too,
>Yes there is a bug about handle_lazy_cover.
>
>void ia64_do_page_fault (unsigned long address, unsigned long isr, struct
>pt_regs *regs, unsigned long itir)
>{
>       unsigned long iip = regs->cr_iip, iha;
>       // FIXME should validate address here
>       unsigned long pteval;
>       unsigned long is_data = !((isr >> IA64_ISR_X_BIT) & 1UL);
>       IA64FAULT fault;
>
>       if ((isr & IA64_ISR_IR) && handle_lazy_cover(current, isr, regs)) 
> return;
>
>This code sequence is intended to handle following scenario.
>
>1. Guest executes br.ret, this may cause mandatory RSE load, and this load may
>cause TLB miss.
>2. VMM gets control, but VMM can't handle this TLB miss itself, then VMM 
>injects
>TLB miss to Guest TLB miss handler, when VMM executing "rfi" to jump to Guest
>TLB miss handler, this TLB miss happens again.
>3. At this time, interrupt_collection_enabled is 0, so handle_lazy_cover
>executes "cover" on behalf of Guest, and return to Guest TLB miss handler 
>again,
>this time there is no TLB miss.
>
>
>Following code sequence is in ia64_leave_kernel path with psr.ic and psr.i off.
>When br.ret.dptk.many b0 is executed, there may be a mandatory load, thus
>There may be a tlb miss, according to above description handle_lazy_cover
>executes "cover" on behalf of Guest and return to Guest, this is no correct
>in this scenario.
>
>I didn't find an easy way to fix this bug.
>
>
>       mov loc6=0
>       mov loc7=0
>(pRecurse) br.call.dptk.few b0=rse_clear_invalid
>       ;;
>       mov loc8=0
>       mov loc9=0
>       cmp.ne pReturn,p0=r0,in1        // if recursion count != 0, we need to 
> do a
>br.ret
>       mov loc10=0
>       mov loc11=0
>(pReturn) br.ret.dptk.many b0
>#endif /* !CONFIG_ITANIUM */
>#      undef pRecurse
>#      undef pReturn
>       ;;
>       alloc r17=ar.pfs,0,0,0,0        // drop current register frame
>       ;;
>       loadrs
>
>Thanks,
>Anthony
>
>
>>
>>Tested by doing many linux kernel compilation in SMP domU (> 100).
>>
>>Tristan.
>
>_______________________________________________
>Xen-ia64-devel mailing list
>Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>http://lists.xensource.com/xen-ia64-devel

Attachment: rse.patch
Description: rse.patch

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.