[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen 3.0.0 32bit-pae (testing changeset 8270) crashes(pgtable.c:284, kernel bug?)



Am Donnerstag, 2. Februar 2006 12:52 schrieb Ian Pratt:
> > 3.0.1 seems to fix the bug I saw on my two machines, but now
> > there is another
> > (but somehow related) problem for me in 3.0.1-pae. I don't
> > know if it's still
> > related to the 3ware controller, but at least it only appears
> > for domains
> > that have memory above the 32bit adress-space again, so the
> > first started
> > domUs run fine. The big difference is that I don't have any
> > complete freezes
> > of the xen machine anymore, just domUs are crashing this time.
>
> Interesting. It looks like xen is running out of memory below 4GB, and
> can't service the domain's request for a new L3 PGD, causing the domain
> to bug out.
>
> Are you using dom0_mem= on the xen command line to constrain dom0's
> memory usage or are relying on dom0 releasing memory automatically as
> you start other domains? If the latter, I expect dom0 is hogging all the
> pages below 4GB. [Grrr, PAE is such a crock...]
>
> Given that your 3ware controller is already putting pressure on the
> bottom 4GB you'd be better off setting your initial dom0 memory at boot
> time.

your right, if dom0_mem set to 196MB then I can start 20 domains using all 
available ram without any problems. no crashing of a domU or even the whole 
system. Really seems to work now... Great! Thanks a lot!

> Please let me know how you get on. BTW: can you get a serial line on the
> machine? It might be interesting to see some of xen's memory usage
> diagnostics.

Before I rebooted with dom0_mem set to 196MB I took some information from the 
half-crashed domU (which was still running, but not useable because more or 
less all commands were crashing):

(XEN) General information for domain 52:
(XEN)     flags=0 refcnt=3 nr_pages=32763 xenheap_pages=5 dirty_cpus={}
(XEN)     handle=e4bc6beb-2398-402e-9956-6c2975f74fea
(XEN) Rangesets belonging to domain 52:
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN)     I/O Ports  { }
(XEN) Memory pages belonging to domain 52:
(XEN)     DomPage list too long to display
(XEN)     XenPage 00b70000: mfn=00000b70, caf=80000002, taf=f0000002
(XEN)     XenPage 00b71000: mfn=00000b71, caf=80000002, taf=f0000002
(XEN)     XenPage 00b72000: mfn=00000b72, caf=80000002, taf=f0000002
(XEN)     XenPage 00b73000: mfn=00000b73, caf=80000002, taf=f0000002
(XEN)     XenPage 00b63000: mfn=00000b63, caf=80000002, taf=f0000002
(XEN) VCPU information and callbacks for domain 52:
(XEN)     VCPU0: CPU3 [has=F] flags=15 upcall_pend = 00, upcall_mask = 00 
dirty_
cpus={3} cpu_affinity={0-31}
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)
(XEN)     VCPU1: CPU0 [has=F] flags=15 upcall_pend = 00, upcall_mask = 00 
dirty_
cpus={} cpu_affinity={0-31}
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)
(XEN)     VCPU2: CPU1 [has=F] flags=15 upcall_pend = 00, upcall_mask = 00 
dirty_
cpus={} cpu_affinity={0-31}
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)

(I hope I got the right information for you)

I also saw:  "xen_net: Memory squeeze in netback driver." in dom0 and a lot 
of: "Timer ISR/0: Time went backwards: delta=-3312252171 cpu_delta=107747829 
shadow=379343963229 ..." messages as I tried to get information with SysRQ 
via serial.

Even if I have a solution now (using dom0_mem and 3.0.1) I would like to help 
solve that problem. So if you need more information, please let me know. I 
think I will have this testing machine for about a week from now on. After 
that time it should be used as production system as well and then no more 
debugging can be made.

> Ian
>
> > the domU doesn't always crash at the very same place,
> > sometimes at the
> > beginning of the init process, sometimes when it loads
> > modules, sometimes
> > when services gets started... Sometimes this crash happens
> > more then once
> > before the domU panics.
> >
> > here is what I see in the domU console:
> >
> > ------------[ cut here ]------------
> > kernel BUG at <bad filename>:63723!
> > invalid operand: 0000 [#1]
> > SMP
> > Modules linked in: 8250 reiserfs efs isofs vfat fat ext3 jbd
> > evdev pci_hotplug
> > dm_mod sd_mod 3w_xxxx e1000 jedec_probe cfi_probe gen_probe
> > chipreg mtdcore
> > map_funcs i2c_i801 i2c_core parport_pc parport serial_core
> > usbhid pcmcia
> > yenta_socket rsrc_nonstatic pcmcia_core processor genrtc sbp2
> > ohci1394
> > ieee1394 usb_storage ohci_hcd uhci_hcd 3w_9xxx scsi_mod unix
> > CPU:    0
> > EIP:    0061:[<c01182b6>]    Not tainted VLI
> > EFLAGS: 00010282   (2.6.12.6-xen)
> > EIP is at pgd_ctor+0x26/0x30
> > eax: fffffff4   ebx: 00000001   ecx: f577e000   edx: 00000000
> > esi: c118fd80   edi: c12bd258   ebp: c12bd240   esp: c864dd38
> > ds: 007b   es: 007b   ss: 0069
> > Process rcS (pid: 1041, threadinfo=c864c000 task=c06f8a40)
> > Stack: c77ae000 00000000 00000020 c014dd51 c77ae000 c118fd80
> > 00000001 c12bd240
> >        c77ae000 c118fd80 00000000 c014decd c118fd80 c12bd240
> > 00000001 000000d0
> >        c118fde0 00000001 000000d0 c119d980 0000000c 000000d0
> > 00000000 c014e0db
> > Call Trace:
> >  [<c014dd51>] cache_init_objs+0x71/0x80
> >  [<c014decd>] cache_grow+0x10d/0x1a0
> >  [<c014e0db>] cache_alloc_refill+0x17b/0x220
> >  [<c014e39f>] kmem_cache_alloc+0x7f/0x90
> >  [<c011833d>] pgd_alloc+0x1d/0x310
> >  [<c01216fe>] mm_init+0xce/0x100
> >  [<c0121a14>] copy_mm+0xd4/0x3d0
> >  [<c0121fdf>] copy_files+0x1af/0x320
> >  [<c03f9d00>] parse_header+0xb0/0xe0
> >  [<c03f9d04>] parse_header+0xb4/0xe0
> >  [<c01225af>] copy_process+0x3df/0xd00
> >  [<c0166f4f>] fd_install+0x2f/0x60
> >  [<c0122fc9>] do_fork+0x69/0x18f
> >  [<c0130e4a>] sys_rt_sigprocmask+0xaa/0x110
> >  [<c0108f91>] sys_fork+0x31/0x40
> >  [<c010a65d>] syscall_call+0x7/0xb
> > Code: 00 f3 ab 5f c3 83 ec 0c b8 20 00 00 00 89 44 24 08 31
> > c0 89 44 24 04 8b
> > 44 24 10 89 04 24 e8 d2 2b 00 00 85 c0 75 04 83 c4 0c c3 <0f>
> > 0b eb f8 8d b6
> > 00 00 00 00 83 ec 08 b8 f8 e3 36 c0 89 5c 24
> >  /etc/init.d/rcS: line 57:  1041 Segmentation fault      (
> > trap - INT QUIT
> > TSTP; set start; . $i )
> >
> > something I can do to help resolving that?
> >
> > thx & regards,
> > -- Ralph
> >
> > > Ian
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.