[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-ia64-devel] RE: [PATCH] Patch to make latest hg multi-domain back to work


  • To: "Magenheimer, Dan \(HP Labs Fort Collins\)" <dan.magenheimer@xxxxxx>, "Byrne, John \(HP Labs\)" <john.l.byrne@xxxxxx>
  • From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
  • Date: Thu, 8 Sep 2005 17:15:50 +0800
  • Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Thu, 08 Sep 2005 09:13:38 +0000
  • List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
  • Thread-index: AcWvySFLppJ+ILkHRKOQH/MX6thvfgEH6v3wABtDOHA=
  • Thread-topic: [PATCH] Patch to make latest hg multi-domain back to work

Still work for me.

Thanks,
Kevin

>-----Original Message-----
>From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:dan.magenheimer@xxxxxx]
>Sent: 2005年9月8日 4:57
>To: Tian, Kevin; Byrne, John (HP Labs)
>Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>Subject: RE: [PATCH] Patch to make latest hg multi-domain back to work
>
>It appears that the patch below has created some instability
>in domain0.  I regularly see a crash now in domain0 when
>compiling linux.  I changed back to the old code and the
>crash seems to go away.  Since it is unpredictable, I
>changed back to the new code AND added printfs around
>the new code in vcpu_translate and domain0 fails immediately after
>the printf (but ONLY when it is called from ia64_do_page_fault...
>its OK when called from vcpu_tpa).
>
>The attached patch returns stability to the system.  It
>is definitely not a final patch (for example it's not
>SMP-safe), but I thought I would
>post it if anybody is trying to get some work done and
>domain0 keeps crashing intermittently.
>
>Kevin, John, I still haven't succesfully reproduced your
>multi-domain success, so please try this patch with
>the second domain.
>
>Thanks,
>Dan
>
>> -----Original Message-----
>> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx]
>> Sent: Friday, September 02, 2005 8:18 AM
>> To: Magenheimer, Dan (HP Labs Fort Collins); Byrne, John (HP Labs)
>> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: [PATCH] Patch to make latest hg multi-domain back to work
>>
>> I saw some intermittent/weird behavior on latest xen-ia64-unstable.hg
>> (Rev 6461), where sometimes I can login into xenU shell, sometimes
>> pending after "Mounting root fs...", and even sometimes the
>> whole system
>> is broken as following:
>>
>> (XEN) ia64_fault: General Exception: IA-64 Reserved
>> Register/Field fault
>> (data access): reflecting
>> (XEN) $$$$$ PANIC in domain 1 (k6=f000000007fd8000): psr.ic off,
>> delivering
>> fault=5300,ipsr=0000121208026010,iip=a00000010000cd00,ifa=f000
>> 000007fdfd
>> 60,isr=00000a0c00000004,PSCB.iip*** ADD REGISTER DUMP HERE
>> FOR DEBUGGING
>> (XEN) BUG at domain.c:311
>> (XEN) priv_emulate: priv_handle_op fails, isr=0000000000000000
>> (XEN)
>>
>> Finally I found the root cause is that match_dtlb should return guest
>> pte instead of machine pte, because translate_machine_pte will be
>> invoked always after vcpu_translate. Translate_machine_pte assumes to
>> accept a guest pte and will walk 3 level tables to get machine frame
>> number. Why does it happen so scare?
>>      - For xen0, guest pfn == machine pfn, so nothing happen
>>      - For xenU, currently there's only one vtlb entry to cache
>> latest inserted TC entry. Say current vtlb entry for VA1 has been
>> inserted into machine TLB. Normally there'll be many itc issued before
>> machine TC for VA1 is purged. Those insertion will change single vtlb
>> entry. So in 99.99% case, once guest va is purged out of machine
>> TLB/vhpt and trigger TLB miss again, match_tlb will fail.
>>
>> But there's also corner case where vtlb entry has not been updated but
>> the machine TC entry for VA1 has been purged. In this case,
>> if trying to
>> access that VA1 immediately, match_dtlb will return true and then
>> problematic code becomes the murderer.
>>
>> For example, sometimes I saw:
>> (XEN) translate_domain_pte: bad mpa=000000007f170080 (>
>> 0000000010004000),vadr=5fffff0000000080,pteval=000000007f17056
>> 1,itir=000
>> 0000000000038
>> (XEN) lookup_domain_mpa: bad mpa 000000007f170080 (> 0000000010004000
>> Above access happens when vcpu_translate tries to access guest SVHPT.
>> You can saw 0x7f170080 is actually machine pfn. When 0x7f170080 is
>> passed into translate_machine_pte, warning shows and it's
>> finally mapped
>> into machine pfn 0. (Maybe we can change such error condition
>> to panic,
>> instead of return incorrect pfn)
>>
>> Then things all went weird:
>>  (XEN) translate_domain_pte: bad mpa=0000eef3f000e738 (>
>> 0000000010004000),vadr=4000000000042738,pteval=f000eef3f000eef
>> 3,itir=000
>> 0000000026238
>> (XEN) lookup_domain_mpa: bad mpa 0000eef3f000e738 (> 0000000010004000
>>
>> And finally GP fault happens. This error has actually hidden
>> behind for
>> a long time, but seldom triggered.
>>
>> John, please make a test on your side with all the patches I sent out
>> today (Including the max_page one). I believe we can call it
>> an end now.
>> ;-)
>>
>> BTW, Dan, there's two heads on current xen-ia64-unstable.hg.
>> Please do a
>> merge.
>>
>> Signed-off-by Kevin Tian <Kevin.tian@xxxxxxxxx>
>>
>> diff -r 68d8a0a1aeb7 xen/arch/ia64/xen/vcpu.c
>> --- a/xen/arch/ia64/xen/vcpu.c       Thu Sep  1 21:51:57 2005
>> +++ b/xen/arch/ia64/xen/vcpu.c       Fri Sep  2 21:30:01 2005
>> @@ -1315,7 +1315,8 @@
>>      /* check 1-entry TLB */
>>      if ((trp = match_dtlb(vcpu,address))) {
>>              dtlb_translate_count++;
>> -            *pteval = trp->page_flags;
>> +            //*pteval = trp->page_flags;
>> +            *pteval = vcpu->arch.dtlb_pte;
>>              *itir = trp->itir;
>>              return IA64_NO_FAULT;
>>      }
>>
>> Thanks,
>> Kevin
>>

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.