[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-ia64-devel] RE: [PATCH] Patch to make latest hg multi-domain back to work


  • To: "Tian, Kevin" <kevin.tian@xxxxxxxxx>, "Byrne, John (HP Labs)" <john.l.byrne@xxxxxx>
  • From: "Magenheimer, Dan (HP Labs Fort Collins)" <dan.magenheimer@xxxxxx>
  • Date: Wed, 7 Sep 2005 13:56:48 -0700
  • Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 07 Sep 2005 20:54:35 +0000
  • List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
  • Thread-index: AcWvySFLppJ+ILkHRKOQH/MX6thvfgEH6v3w
  • Thread-topic: [PATCH] Patch to make latest hg multi-domain back to work

It appears that the patch below has created some instability
in domain0.  I regularly see a crash now in domain0 when
compiling linux.  I changed back to the old code and the
crash seems to go away.  Since it is unpredictable, I
changed back to the new code AND added printfs around
the new code in vcpu_translate and domain0 fails immediately after
the printf (but ONLY when it is called from ia64_do_page_fault...
its OK when called from vcpu_tpa).

The attached patch returns stability to the system.  It
is definitely not a final patch (for example it's not
SMP-safe), but I thought I would
post it if anybody is trying to get some work done and
domain0 keeps crashing intermittently.

Kevin, John, I still haven't succesfully reproduced your
multi-domain success, so please try this patch with
the second domain.

Thanks,
Dan

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx] 
> Sent: Friday, September 02, 2005 8:18 AM
> To: Magenheimer, Dan (HP Labs Fort Collins); Byrne, John (HP Labs)
> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> Subject: [PATCH] Patch to make latest hg multi-domain back to work
> 
> I saw some intermittent/weird behavior on latest xen-ia64-unstable.hg
> (Rev 6461), where sometimes I can login into xenU shell, sometimes
> pending after "Mounting root fs...", and even sometimes the 
> whole system
> is broken as following:
> 
> (XEN) ia64_fault: General Exception: IA-64 Reserved 
> Register/Field fault
> (data access): reflecting
> (XEN) $$$$$ PANIC in domain 1 (k6=f000000007fd8000): psr.ic off,
> delivering
> fault=5300,ipsr=0000121208026010,iip=a00000010000cd00,ifa=f000
> 000007fdfd
> 60,isr=00000a0c00000004,PSCB.iip*** ADD REGISTER DUMP HERE 
> FOR DEBUGGING
> (XEN) BUG at domain.c:311
> (XEN) priv_emulate: priv_handle_op fails, isr=0000000000000000
> (XEN) 
> 
> Finally I found the root cause is that match_dtlb should return guest
> pte instead of machine pte, because translate_machine_pte will be
> invoked always after vcpu_translate. Translate_machine_pte assumes to
> accept a guest pte and will walk 3 level tables to get machine frame
> number. Why does it happen so scare?
>       - For xen0, guest pfn == machine pfn, so nothing happen
>       - For xenU, currently there's only one vtlb entry to cache
> latest inserted TC entry. Say current vtlb entry for VA1 has been
> inserted into machine TLB. Normally there'll be many itc issued before
> machine TC for VA1 is purged. Those insertion will change single vtlb
> entry. So in 99.99% case, once guest va is purged out of machine
> TLB/vhpt and trigger TLB miss again, match_tlb will fail.
> 
> But there's also corner case where vtlb entry has not been updated but
> the machine TC entry for VA1 has been purged. In this case, 
> if trying to
> access that VA1 immediately, match_dtlb will return true and then
> problematic code becomes the murderer.
> 
> For example, sometimes I saw:
> (XEN) translate_domain_pte: bad mpa=000000007f170080 (>
> 0000000010004000),vadr=5fffff0000000080,pteval=000000007f17056
> 1,itir=000
> 0000000000038
> (XEN) lookup_domain_mpa: bad mpa 000000007f170080 (> 0000000010004000
> Above access happens when vcpu_translate tries to access guest SVHPT.
> You can saw 0x7f170080 is actually machine pfn. When 0x7f170080 is
> passed into translate_machine_pte, warning shows and it's 
> finally mapped
> into machine pfn 0. (Maybe we can change such error condition 
> to panic,
> instead of return incorrect pfn)
> 
> Then things all went weird:
>  (XEN) translate_domain_pte: bad mpa=0000eef3f000e738 (>
> 0000000010004000),vadr=4000000000042738,pteval=f000eef3f000eef
> 3,itir=000
> 0000000026238
> (XEN) lookup_domain_mpa: bad mpa 0000eef3f000e738 (> 0000000010004000
> 
> And finally GP fault happens. This error has actually hidden 
> behind for
> a long time, but seldom triggered.
> 
> John, please make a test on your side with all the patches I sent out
> today (Including the max_page one). I believe we can call it 
> an end now.
> ;-)
> 
> BTW, Dan, there's two heads on current xen-ia64-unstable.hg. 
> Please do a
> merge.
> 
> Signed-off-by Kevin Tian <Kevin.tian@xxxxxxxxx>
> 
> diff -r 68d8a0a1aeb7 xen/arch/ia64/xen/vcpu.c
> --- a/xen/arch/ia64/xen/vcpu.c        Thu Sep  1 21:51:57 2005
> +++ b/xen/arch/ia64/xen/vcpu.c        Fri Sep  2 21:30:01 2005
> @@ -1315,7 +1315,8 @@
>       /* check 1-entry TLB */
>       if ((trp = match_dtlb(vcpu,address))) {
>               dtlb_translate_count++;
> -             *pteval = trp->page_flags;
> +             //*pteval = trp->page_flags;
> +             *pteval = vcpu->arch.dtlb_pte;
>               *itir = trp->itir;
>               return IA64_NO_FAULT;
>       }
> 
> Thanks,
> Kevin
> 

Attachment: match_dtlb_take2
Description: match_dtlb_take2

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.