[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-ia64-devel] RE: [PATCH] Patch to make latest hg multi-domain back to work
It appears that the patch below has created some instability in domain0. I regularly see a crash now in domain0 when compiling linux. I changed back to the old code and the crash seems to go away. Since it is unpredictable, I changed back to the new code AND added printfs around the new code in vcpu_translate and domain0 fails immediately after the printf (but ONLY when it is called from ia64_do_page_fault... its OK when called from vcpu_tpa). The attached patch returns stability to the system. It is definitely not a final patch (for example it's not SMP-safe), but I thought I would post it if anybody is trying to get some work done and domain0 keeps crashing intermittently. Kevin, John, I still haven't succesfully reproduced your multi-domain success, so please try this patch with the second domain. Thanks, Dan > -----Original Message----- > From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx] > Sent: Friday, September 02, 2005 8:18 AM > To: Magenheimer, Dan (HP Labs Fort Collins); Byrne, John (HP Labs) > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > Subject: [PATCH] Patch to make latest hg multi-domain back to work > > I saw some intermittent/weird behavior on latest xen-ia64-unstable.hg > (Rev 6461), where sometimes I can login into xenU shell, sometimes > pending after "Mounting root fs...", and even sometimes the > whole system > is broken as following: > > (XEN) ia64_fault: General Exception: IA-64 Reserved > Register/Field fault > (data access): reflecting > (XEN) $$$$$ PANIC in domain 1 (k6=f000000007fd8000): psr.ic off, > delivering > fault=5300,ipsr=0000121208026010,iip=a00000010000cd00,ifa=f000 > 000007fdfd > 60,isr=00000a0c00000004,PSCB.iip*** ADD REGISTER DUMP HERE > FOR DEBUGGING > (XEN) BUG at domain.c:311 > (XEN) priv_emulate: priv_handle_op fails, isr=0000000000000000 > (XEN) > > Finally I found the root cause is that match_dtlb should return guest > pte instead of machine pte, because translate_machine_pte will be > invoked always after vcpu_translate. Translate_machine_pte assumes to > accept a guest pte and will walk 3 level tables to get machine frame > number. Why does it happen so scare? > - For xen0, guest pfn == machine pfn, so nothing happen > - For xenU, currently there's only one vtlb entry to cache > latest inserted TC entry. Say current vtlb entry for VA1 has been > inserted into machine TLB. Normally there'll be many itc issued before > machine TC for VA1 is purged. Those insertion will change single vtlb > entry. So in 99.99% case, once guest va is purged out of machine > TLB/vhpt and trigger TLB miss again, match_tlb will fail. > > But there's also corner case where vtlb entry has not been updated but > the machine TC entry for VA1 has been purged. In this case, > if trying to > access that VA1 immediately, match_dtlb will return true and then > problematic code becomes the murderer. > > For example, sometimes I saw: > (XEN) translate_domain_pte: bad mpa=000000007f170080 (> > 0000000010004000),vadr=5fffff0000000080,pteval=000000007f17056 > 1,itir=000 > 0000000000038 > (XEN) lookup_domain_mpa: bad mpa 000000007f170080 (> 0000000010004000 > Above access happens when vcpu_translate tries to access guest SVHPT. > You can saw 0x7f170080 is actually machine pfn. When 0x7f170080 is > passed into translate_machine_pte, warning shows and it's > finally mapped > into machine pfn 0. (Maybe we can change such error condition > to panic, > instead of return incorrect pfn) > > Then things all went weird: > (XEN) translate_domain_pte: bad mpa=0000eef3f000e738 (> > 0000000010004000),vadr=4000000000042738,pteval=f000eef3f000eef > 3,itir=000 > 0000000026238 > (XEN) lookup_domain_mpa: bad mpa 0000eef3f000e738 (> 0000000010004000 > > And finally GP fault happens. This error has actually hidden > behind for > a long time, but seldom triggered. > > John, please make a test on your side with all the patches I sent out > today (Including the max_page one). I believe we can call it > an end now. > ;-) > > BTW, Dan, there's two heads on current xen-ia64-unstable.hg. > Please do a > merge. > > Signed-off-by Kevin Tian <Kevin.tian@xxxxxxxxx> > > diff -r 68d8a0a1aeb7 xen/arch/ia64/xen/vcpu.c > --- a/xen/arch/ia64/xen/vcpu.c Thu Sep 1 21:51:57 2005 > +++ b/xen/arch/ia64/xen/vcpu.c Fri Sep 2 21:30:01 2005 > @@ -1315,7 +1315,8 @@ > /* check 1-entry TLB */ > if ((trp = match_dtlb(vcpu,address))) { > dtlb_translate_count++; > - *pteval = trp->page_flags; > + //*pteval = trp->page_flags; > + *pteval = vcpu->arch.dtlb_pte; > *itir = trp->itir; > return IA64_NO_FAULT; > } > > Thanks, > Kevin > Attachment:
match_dtlb_take2 _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |