[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-ia64-devel] RE: [PATCH] Patch to make latest hg multi-domain back to work
Still work for me. Thanks, Kevin >-----Original Message----- >From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:dan.magenheimer@xxxxxx] >Sent: 2005年9月8日 4:57 >To: Tian, Kevin; Byrne, John (HP Labs) >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >Subject: RE: [PATCH] Patch to make latest hg multi-domain back to work > >It appears that the patch below has created some instability >in domain0. I regularly see a crash now in domain0 when >compiling linux. I changed back to the old code and the >crash seems to go away. Since it is unpredictable, I >changed back to the new code AND added printfs around >the new code in vcpu_translate and domain0 fails immediately after >the printf (but ONLY when it is called from ia64_do_page_fault... >its OK when called from vcpu_tpa). > >The attached patch returns stability to the system. It >is definitely not a final patch (for example it's not >SMP-safe), but I thought I would >post it if anybody is trying to get some work done and >domain0 keeps crashing intermittently. > >Kevin, John, I still haven't succesfully reproduced your >multi-domain success, so please try this patch with >the second domain. > >Thanks, >Dan > >> -----Original Message----- >> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx] >> Sent: Friday, September 02, 2005 8:18 AM >> To: Magenheimer, Dan (HP Labs Fort Collins); Byrne, John (HP Labs) >> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> Subject: [PATCH] Patch to make latest hg multi-domain back to work >> >> I saw some intermittent/weird behavior on latest xen-ia64-unstable.hg >> (Rev 6461), where sometimes I can login into xenU shell, sometimes >> pending after "Mounting root fs...", and even sometimes the >> whole system >> is broken as following: >> >> (XEN) ia64_fault: General Exception: IA-64 Reserved >> Register/Field fault >> (data access): reflecting >> (XEN) $$$$$ PANIC in domain 1 (k6=f000000007fd8000): psr.ic off, >> delivering >> fault=5300,ipsr=0000121208026010,iip=a00000010000cd00,ifa=f000 >> 000007fdfd >> 60,isr=00000a0c00000004,PSCB.iip*** ADD REGISTER DUMP HERE >> FOR DEBUGGING >> (XEN) BUG at domain.c:311 >> (XEN) priv_emulate: priv_handle_op fails, isr=0000000000000000 >> (XEN) >> >> Finally I found the root cause is that match_dtlb should return guest >> pte instead of machine pte, because translate_machine_pte will be >> invoked always after vcpu_translate. Translate_machine_pte assumes to >> accept a guest pte and will walk 3 level tables to get machine frame >> number. Why does it happen so scare? >> - For xen0, guest pfn == machine pfn, so nothing happen >> - For xenU, currently there's only one vtlb entry to cache >> latest inserted TC entry. Say current vtlb entry for VA1 has been >> inserted into machine TLB. Normally there'll be many itc issued before >> machine TC for VA1 is purged. Those insertion will change single vtlb >> entry. So in 99.99% case, once guest va is purged out of machine >> TLB/vhpt and trigger TLB miss again, match_tlb will fail. >> >> But there's also corner case where vtlb entry has not been updated but >> the machine TC entry for VA1 has been purged. In this case, >> if trying to >> access that VA1 immediately, match_dtlb will return true and then >> problematic code becomes the murderer. >> >> For example, sometimes I saw: >> (XEN) translate_domain_pte: bad mpa=000000007f170080 (> >> 0000000010004000),vadr=5fffff0000000080,pteval=000000007f17056 >> 1,itir=000 >> 0000000000038 >> (XEN) lookup_domain_mpa: bad mpa 000000007f170080 (> 0000000010004000 >> Above access happens when vcpu_translate tries to access guest SVHPT. >> You can saw 0x7f170080 is actually machine pfn. When 0x7f170080 is >> passed into translate_machine_pte, warning shows and it's >> finally mapped >> into machine pfn 0. (Maybe we can change such error condition >> to panic, >> instead of return incorrect pfn) >> >> Then things all went weird: >> (XEN) translate_domain_pte: bad mpa=0000eef3f000e738 (> >> 0000000010004000),vadr=4000000000042738,pteval=f000eef3f000eef >> 3,itir=000 >> 0000000026238 >> (XEN) lookup_domain_mpa: bad mpa 0000eef3f000e738 (> 0000000010004000 >> >> And finally GP fault happens. This error has actually hidden >> behind for >> a long time, but seldom triggered. >> >> John, please make a test on your side with all the patches I sent out >> today (Including the max_page one). I believe we can call it >> an end now. >> ;-) >> >> BTW, Dan, there's two heads on current xen-ia64-unstable.hg. >> Please do a >> merge. >> >> Signed-off-by Kevin Tian <Kevin.tian@xxxxxxxxx> >> >> diff -r 68d8a0a1aeb7 xen/arch/ia64/xen/vcpu.c >> --- a/xen/arch/ia64/xen/vcpu.c Thu Sep 1 21:51:57 2005 >> +++ b/xen/arch/ia64/xen/vcpu.c Fri Sep 2 21:30:01 2005 >> @@ -1315,7 +1315,8 @@ >> /* check 1-entry TLB */ >> if ((trp = match_dtlb(vcpu,address))) { >> dtlb_translate_count++; >> - *pteval = trp->page_flags; >> + //*pteval = trp->page_flags; >> + *pteval = vcpu->arch.dtlb_pte; >> *itir = trp->itir; >> return IA64_NO_FAULT; >> } >> >> Thanks, >> Kevin >> _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |