[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xenpaging crashes xen in is_iomem_page()



On 10 August 2010 10:19, Olaf Hering <olaf@xxxxxxxxx> wrote:
> On Mon, Aug 09, Patrick Colp wrote:
>
>> > I tried to move the initial evict_victim() calls into the while(1) loop.
>> > If there is no event from xc_wait_for_event_or_timeout(), fill &victims
>> > one by one.
>> >
>> > My attempt looks basically like shown below.
>> > Unfortunately, it crashes xen itself in odd ways. I will look at this
>> > route further tomorrow.
>>
>> It's not immediately clear to me why your change wouldn't work.
>
> Patrick,
>
> there is something weird going on.
> Today I was able to boot the client sucessfully with my change. Still I
> got a few 'grant_table.c:583:d0 Iomem mapping not permitted ffffffffff
> (domain 1)' lines.

This sounds like it's trying to grant pages which have been paged out
(since paged out pages change their p2m mapping to MFN_INVALID which
is 0xffffffff).


> After some tries I found that /usr/bin/free in the client gives IO Error
> when I tried to run it. The same happend with cat /usr/bin/free > /dev/null
> While doing that, I saw that Iomem error above. The gfn happend to be
> 3aba9. I searched that in my xenpaging debug output. There was a
> page-out of gfn 3aba9, but no page-in request.
>
> So it seems that gfn lost its "state" somehow.

I think this means there's a fault path that isn't caught by xenpaging
(again, my guess here would be with the grant table stuff).


> Another thing:
> Now that xenpaging does the page-out process in a slow way, it will take
> alot more time to finish 65K pages. I did a 'init 0' while it was still
> in the middle of the process of filling &victims. This shutdown killed
> xen itself. (ept_get_entry lines come from my own dbg printk, just there
> to check where the 0xffffffffff is coming from.)
>
> --- xen-unstable.hg-4.1.21925.orig/xen/arch/x86/mm/hap/p2m-ept.c
> +++ xen-unstable.hg-4.1.21925/xen/arch/x86/mm/hap/p2m-ept.c
> @@ -488,8 +488,11 @@ static mfn_t ept_get_entry(struct domain
>
> Â Â if ( ept_entry->avail1 != p2m_invalid )
> Â Â {
> + Â Â Â ept_entry_t **__p = (ept_entry_t **)ept_entry;
> Â Â Â Â *t = ept_entry->avail1;
> Â Â Â Â mfn = _mfn(ept_entry->mfn);
> + Â Âif ((mfn_x(mfn) & 0xffffffffffUL) == 0xffffffffffUL)
> + Â Â Â Â Â printk("%s:%s(%u) %lx %p mp %lx gfn 
> %lx\n",__FILE__,__func__,__LINE__,mfn_x(mfn), *__p, max_page, gfn);
> Â Â Â Â if ( i )
> Â Â Â Â {
> Â Â Â Â Â Â /*
>
>
> (XEN) p2m-ept.c:ept_get_entry(495) ffffffffff 000ffffffffffc00 mp 140000 gfn 
> 135a
> (XEN) mem_event.c:195:d0 Ignoring memory paging op on dying domain 1
> (XEN) p2m-ept.c:ept_get_entry(495) ffffffffff 000ffffffffffa00 mp 140000 gfn 
> a7c2
> (XEN) p2m-ept.c:ept_get_entry(495) ffffffffff 000ffffffffffa00 mp 140000 gfn 
> a7c2
> (XEN) Assertion '(((lport) >= 0) && ((lport) < 
> ((((ld)->arch.has_32bit_shinfo) ? 32 : 64) * (((ld)->arch.has_32bit_shinfo) ? 
> 32 : 64))) && (((ld)->evtchn[(lport)/128]) != ((void*)0)))' failed at 
> event_channel.c:1033
> (XEN) Debugging connection not set up.
> (XEN) ----[ Xen-4.1.21925-20100810.075543 Âx86_64 Âdebug=y ÂNot tainted ]----
> (XEN) CPU: Â Â3
> (XEN) RIP: Â Âe008:[<ffff82c480105fed>] notify_via_xen_event_channel+0x43/0xfb
> (XEN) RFLAGS: 0000000000010246 Â CONTEXT: hypervisor
> (XEN) rax: 0000000000000000 Â rbx: 0000000000000007 Â rcx: 0000000000000000
> (XEN) rdx: 0000000000000040 Â rsi: 0000000000000007 Â rdi: ffff830138370194
> (XEN) rbp: ffff83013febfc88 Â rsp: ffff83013febfc68 Â r8: Â0000000000000000
> (XEN) r9: Âffff82c48020aee0 Â r10: 00000000fffffff9 Â r11: 0000000000000004
> (XEN) r12: ffff830138370000 Â r13: ffff830138370190 Â r14: 000000000000a7c2
> (XEN) r15: 000000000012f977 Â cr0: 0000000080050033 Â cr4: 00000000000026f0
> (XEN) cr3: 000000012fb44000 Â cr2: ffff8800e948fe98
> (XEN) ds: 0000 Â es: 0000 Â fs: 0000 Â gs: 0000 Â ss: e010 Â cs: e008
> (XEN) Xen stack trace from rsp=ffff83013febfc68:
> (XEN) Â Â0000000000000282 ffff830138370000 ffff83013febfcd8 ffff830138371548
> (XEN) Â Âffff83013febfcb8 ffff82c4801cef11 ffff830138370000 ffff83013febff18
> (XEN) Â Âffff830138370000 ffff8300bf752000 ffff83013febfd18 ffff82c4801cd070
> (XEN) Â Â000000000000a7c2 0000000a00000003 000000000000a7c2 0000000000000000
> (XEN) Â Â000000030000000a 0000000000000000 000000000000a7c2 0000000000000000
> (XEN) Â Â0000000000000000 0000000000000000 ffff83013febfef8 ffff82c48016c18b
> (XEN) Â Âffff82c480153f82 ffff83013febfd70 ffff82c480151176 ffff83013febff18
> (XEN) Â Âffff83013febff18 ffff83013febff18 ffff83013febff18 ffff83013febff18
> (XEN) Â Âffff83013febff18 ffff83013febff18 ffff83013febff18 ffff83013febff18
> (XEN) Â Âffff83013febff18 ffff83013febfde0 0000000000000286 ffff83013febfe00
> (XEN) Â Â00000195a8185d6b 0000000000000286 ffff8300bf752030 0000000000000000
> (XEN) Â Â0000000000000000 0000000100000001 0000000000000000 ffff83013febfe10
> (XEN) Â Â00000000bf752000 ffff83012f977e98 ffff82f6025f2ee0 ffff83013cf50000
> (XEN) Â Âffff830138370000 ffff8300bf752000 ffff8800f271d000 ffff83013febfe40
> (XEN) Â Â00000195a7fb7184 ffff82c480122617 0000000000000000 800000000a7c2627
> (XEN) Â Âffff83013febfe68 ffff82c48014bcc4 ffff83013febfe68 ffff82c4801615d2
> (XEN) Â Âffff83013febff18 ffff8300bf752000 0000000000000001 0000000000000000
> (XEN) Â Âffff83013febfef8 ffff82c4802033c0 00007f20d9bd3000 0000000000000206
> (XEN) Â Â0000000a800073f0 0000000000000001 000000012f977e98 800000000a7c2627
> (XEN) Â Âffff83013febfed8 ffff8300bf752000 8000000000000427 0000000000000000
> (XEN) Xen call trace:
> (XEN) Â Â[<ffff82c480105fed>] notify_via_xen_event_channel+0x43/0xfb
> (XEN) Â Â[<ffff82c4801cef11>] mem_event_put_request+0x99/0xa7
> (XEN) Â Â[<ffff82c4801cd070>] p2m_mem_paging_populate+0x230/0x242
> (XEN) Â Â[<ffff82c48016c18b>] do_mmu_update+0x696/0x1839
> (XEN) Â Â[<ffff82c4801fe1e2>] syscall_enter+0xf2/0x14c
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) Assertion '(((lport) >= 0) && ((lport) < 
> ((((ld)->arch.has_32bit_shinfo) ? 32 : 64) * (((ld)->arch.has_32bit_shinfo) ? 
> 32 : 64)****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN) Debugging connection not set up.
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

This crash is caused by something in dom0 playing around with the
guest's memory. My guess here is that the guest has shutdown enough to
destroy its event channels. Not entire sure who's the culprit here. It
seems like the xenpaging daemon tried to page something in at some
point, but was denied by Xen since the guest was shutting down. So I
would hazard that the PV drivers are again the culprit (as I've not
encountered this error before either). I suppose it could be a result
of evicting slowly instead of up-front. I'll need to get my hands on
SLES or PV drivers so I can fix the grant table stuff (I had it
working before, but that was before the new grant table v2 stuff).


Patrick

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.