[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xm pause causing lockup


  • To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
  • From: Kip Macy <kip.macy@xxxxxxxxx>
  • Date: Thu, 14 Apr 2005 21:36:12 -0700
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 15 Apr 2005 04:36:13 +0000
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=us6a5vQiYndyfQaKAZ0ijpiW4clnvzuEkXTBYYyy+m3dxrY3miVZLTvXBbxZuoXUHpnkLTjjOrn5jaWsC5lp9PfWcITupYAD2NaPKwRJi/y7OatBs92FRAVtDESUZfi6Zj8GHAMjiyW64yz9dFA2UG54VDDO+4m5GlMpTiOSRRw=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

To further check this I added:
 printk("%s %d %d %d %d %d\n", __FUNCTION__, op->cmd, op->mfn, count,
success_count, domid);
to HYPERVISOR_mmuext_op and something similar to mmu_update.


HYPERVISOR_mmu_update 0xc0200ba0 1 0 32752
HYPERVISOR_mmu_update 0xc0200ba0 1 0 32752
HYPERVISOR_mmuext_op 7 -1069543424 1 0 32752
HYPERVISOR_mmu_update 0xc0200ba0 1 0 32752
HYPERVISOR_mmu_update 0xc0200ba0 1 0 32752
HYPERVISOR_mmuext_op 7 -955666432 1 0 32752
HYPERVISOR_mmuext_op 1 25359 1 0 32752
<lockup>

I'm not sure where I could add printks to
get_page_and_type_from_pagenr without making DOM0 take forever to
boot. Suggestions are welcome. Alternatively you could do me a favor
and just run my FreeBSD binary locally.


On 4/14/05, Kip Macy <kip.macy@xxxxxxxxx> wrote:
> I think there may be a bug in your page pinning validation logic - the
> lockup occurs when stepping through xen_pgd_pin. I don't know if I'm
> really passing in 0, as register locals can quickly get overwritten,
> but it is certainly worth checking.
> 
> Breakpoint 15, pmap_pinit (pmap=0xc06900c0) at
> ../../../i386-xen/i386-xen/pmap.c:1206
> 1206                    xen_pgd_pin(ma);
> (gdb)
> Continuing.
> 
> Breakpoint 8, xen_pgd_pin (ma=0x0) at
> ../../../i386-xen/i386-xen/xen_machdep.c:490
> 490         op.cmd = MMUEXT_PIN_L2_TABLE;
> (gdb) s
> 491         op.mfn = ma >> PAGE_SHIFT;
> (gdb)
> 492         xen_flush_queue();
> (gdb)
> 
> Breakpoint 4, xen_flush_queue () at 
> ../../../i386-xen/i386-xen/xen_machdep.c:431
> 431         if (XPQ_IDX != 0) _xen_flush_queue();
> (gdb)
> 432     }
> (gdb)
> xen_pgd_pin (ma=0x630f) at hypervisor.h:72
> 72      {
> (gdb)
> 76          __asm__ __volatile__ (
> (gdb)
> 
> 
> On 4/14/05, Kip Macy <kip.macy@xxxxxxxxx> wrote:
> > I haven't tracked down the problem yet, but I thought the following
> > was sufficiently interesting to post:
> >
> > kmacy@curly while (1)
> > while? xm list
> > while? sleep 5
> > while? end
> > Name              Id  Mem(MB)  CPU  State  Time(s)  Console
> > Domain-0           0      507    0  r----     67.9
> > xen-vm2            1      128    1  r----      4.0    9601
> > Name              Id  Mem(MB)  CPU  State  Time(s)  Console
> > Domain-0           0      507    0  r----     68.1
> > xen-vm2            1      128    1  r----      4.0    9601
> > Name              Id  Mem(MB)  CPU  State  Time(s)  Console
> > Domain-0           0      507    0  r----     68.3
> > xen-vm2            1      128    1  r----      4.0    9601
> > Name              Id  Mem(MB)  CPU  State  Time(s)  Console
> > Domain-0           0      507    0  r----     68.5
> > xen-vm2            1      128    1  r----      4.0    9601
> > Name              Id  Mem(MB)  CPU  State  Time(s)  Console
> > Domain-0           0      507    0  r----     68.7
> > xen-vm2            1      128    1  r----      4.0    9601
> > Name              Id  Mem(MB)  CPU  State  Time(s)  Console
> > Domain-0           0      507    0  r----     68.9
> > xen-vm2            1      128    1  r----      4.0    9601
> >
> > xen-vm2 is always shown as running, but its time is not increasing.
> >
> >                -Kip
> >
> >
> > On 4/13/05, Kip Macy <kip.macy@xxxxxxxxx> wrote:
> > > On 4/13/05, Keir Fraser <Keir.Fraser@xxxxxxxxxxxx> wrote:
> > > > Probably easiest way to trace this is with printk's in Xen. The guts of
> > > > the work is done by domain_pause_by_systemcontroller() in xen/sched.h.
> > > > This in turn calls domain_sleep() in common/schedule.c.
> > >
> > > I traced through that code a while back when trying to decide what to
> > > call from the int3 handler.
> > >
> > > A particularly
> > > > interesting place to look will be teh synchronous spin loop at the end
> > > > of domain_sleep -- if the paused domain isn't descheduled for some
> > > > weird reason then the spin loop would never exit and domain0 would
> > > > hang.
> > >
> > > Good point. It will be interesting to see.
> > >
> > > I sometimes wonder if I should keep some of the buggy versions of
> > > FreeBSD around for regression testing as they trigger some interesting
> > > behaviours in xen and xend.
> > >
> > >            -Kip
> > >
> >
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.