[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Re: paging request failures under load (was: Re: Null pointer deference)
They're baaaack! ;-} While upgrading a virtual domain to debian testing so I could get a gcc > 3.0.4, it issued a couple of these oopses and hung, resulting in a broken upgrade. This is different hardware, same image; a machine I was totally unable to duplicate these on yesterday. Xen 02 Feb 1.2 build, gcc 3.0.4. Also this morning, my first production customer sent me mail saying one of his guests was "slow". I looked, and, sure enough, he had had these same oopses -- paging failure followed by null pointer dereference. This was a third node, same image. Telltale symptom from his end was that 'top' hangs -- I've noticed this in all cases. This plus Keir's message about the clock skew (which I'm also seeing on these guests) makes me suspect gcc 3.0.4. So here's what I'm doing: - running under native Linux, upgrade an unmounted NFS root filesystem to debian testing in a chroot - still in the chroot, build today's 1.2 xen/xenolinux with gcc 3.3.2 - deploy the resulting xen and xenolinux on one or more nodes - install ksymoops/System.map on those nodes so that we can get meaningful oops output if it does happen again (per earlier mail from Ian and Bin) - test, test, test I'll let you know how it goes. The reason I'm doing this in a chroot is that I'm thinking of setting up an automated Xen regression test environment under Xen, daily pulls, that sort of thing. This NFS root would be a build server for that environment. Is anyone already working on something like this? Steve On Mon, Feb 09, 2004 at 10:34:34PM -0800, wrote: > Before anyone burns too much time on this, hang on -- I wasn't able to > duplicate the problem on another cluster node (both nodes were built > from the same SystemImager image). I'm looking for the reason why, and > will let you know as soon as I do. > > Steve > > On Mon, Feb 09, 2004 at 08:13:22PM -0800, wrote: > > Okay, the problem still exists when I bump the memory up to 256Mb, and > > never swap. I.E. I've found no workaround. Hasn't anyone else hit > > anything like this? > > > > Steve > > > > > > DOM3: xen_console_init > > DOM3: Linux version 2.4.24-xeno (stevegt@pathfinder) (gcc version > > 3.0.4) #16 Mon Feb 2 17:46:41 PST 2004 > > DOM3: On node 0 totalpages: 65536 > > DOM3: zone(0): 4096 pages. > > DOM3: zone(1): 61440 pages. > > DOM3: zone(2): 0 pages. > > DOM3: Kernel command line: > > ip=64.71.149.20:10.27.2.50:64.71.149.1:255.255.255.0::eth0:off > > root=/dev/nfs nfsroot=/export//xen/fs/stevegt/tcx/root 4 DOMID=20 > > DOM3: Initializing CPU#0 > > DOM3: Xen reported: 398.780 MHz processor. > > DOM3: Calibrating delay loop... 1592.52 BogoMIPS > > DOM3: Memory: 257132k/262144k available (1078k kernel code, 5012k > > reserved, 308k data, 52k init, 0k highmem) > > DOM3: Dentry cache hash table entries: 32768 (order: 6, 262144 bytes) > > DOM3: Inode cache hash table entries: 16384 (order: 5, 131072 bytes) > > DOM3: Mount cache hash table entries: 512 (order: 0, 4096 bytes) > > DOM3: Buffer cache hash table entries: 16384 (order: 4, 65536 bytes) > > DOM3: Page-cache hash table entries: 65536 (order: 6, 262144 bytes) > > DOM3: CPU: L1 I cache: 16K, L1 D cache: 16K > > DOM3: CPU: L2 cache: 512K > > DOM3: CPU: Intel Pentium II (Deschutes) stepping 01 > > DOM3: POSIX conformance testing by UNIFIX > > DOM3: Linux NET4.0 for Linux 2.4 > > DOM3: Based upon Swansea University Computer Society NET3.039 > > DOM3: Initializing RT netlink socket > > DOM3: Starting kswapd > > DOM3: Journalled Block Device driver loaded > > DOM3: Installing knfsd (copyright (C) 1996 okir@xxxxxxxxxxxx). > > DOM3: Xeno console successfully installed > > DOM3: Starting Xeno Balloon driver > > DOM3: pty: 256 Unix98 ptys configured > > DOM3: RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 > > blocksize > > DOM3: loop: loaded (max 8 devices) > > DOM3: NET4: Linux TCP/IP 1.0 for NET4.0 > > DOM3: IP Protocols: ICMP, UDP, TCP > > DOM3: IP: routing cache hash table of 2048 buckets, 16Kbytes > > DOM3: TCP: Hash tables configured (established 16384 bind 16384) > > DOM3: IP-Config: Complete: > > DOM3: device=eth0, addr=64.71.149.20, mask=255.255.255.0, > > gw=64.71.149.1, > > DOM3: host=64.71.149.20, domain=, nis-domain=(none), > > DOM3: bootserver=10.27.2.50, rootserver=10.27.2.50, rootpath= > > DOM3: ip_conntrack version 2.1 (2048 buckets, 16384 max) - 292 bytes > > per conntrack > > DOM3: ip_tables: (C) 2000-2002 Netfilter core team > > DOM3: NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > > DOM3: Looking up port of RPC 100003/2 on 10.27.2.50 > > DOM3: Looking up port of RPC 100005/1 on 10.27.2.50 > > DOM3: VFS: Mounted root (nfs filesystem). > > DOM3: Freeing unused kernel memory: 52k freed > > DOM3: INIT: version 2.84 booting > > DOM3: Activating swap. > > DOM3: Adding Swap: 262136k swap-space (priority -1) > > DOM3: Checking root file system... > > DOM3: fsck 1.27 (8-Mar-2002) > > DOM3: 10.27.2.50:/export/xen/fs/stevegt/tcx: NFS file system. > > DOM3: System time was Tue Feb 10 02:14:36 UTC 2004. > > DOM3: Setting the System Clock using the Hardware Clock as > > reference... > > DOM3: modprobe: modprobe: Can't locate module char-major-10-135 > > DOM3: modprobe: modprobe: Can't locate module char-major-4 > > DOM3: hwclock is unable to get I/O port access: the iopl(3) call > > failed. > > DOM3: modprobe: modprobe: Can't locate module char-major-10-135 > > DOM3: modprobe: modprobe: Can't locate module char-major-4 > > DOM3: System Clock set. System local time is now Tue Feb 10 02:14:36 > > UTC 2004. > > DOM3: Calculating module dependencies... depmod: cannot read ELF > > header from /lib/modules/2.4.24-xeno/modules.dep > > DOM3: depmod: cannot read ELF header from > > /lib/modules/2.4.24-xeno/modules.generic_string > > DOM3: depmod: /lib/modules/2.4.24-xeno/modules.ieee1394map is not an > > ELF file > > DOM3: depmod: /lib/modules/2.4.24-xeno/modules.isapnpmap is not an ELF > > file > > DOM3: depmod: cannot read ELF header from > > /lib/modules/2.4.24-xeno/modules.parportmap > > DOM3: depmod: /lib/modules/2.4.24-xeno/modules.pcimap is not an ELF > > file > > DOM3: depmod: cannot read ELF header from > > /lib/modules/2.4.24-xeno/modules.pnpbiosmap > > DOM3: depmod: /lib/modules/2.4.24-xeno/modules.usbmap is not an ELF > > file > > DOM3: done. > > DOM3: Loading modules: > > DOM3: Checking all file systems... > > DOM3: fsck 1.27 (8-Mar-2002) > > DOM3: Setting kernel variables. > > DOM3: Loading the saved-state of the serial devices... > > DOM3: Mounting local filesystems... > > DOM3: nothing was mounted > > DOM3: Running 0dns-down to make sure resolv.conf is ok...done. > > DOM3: Cleaning: /etc/network/ifstate. > > DOM3: Setting up IP spoofing protection: rp_filter. > > DOM3: Configuring network interfaces: done. > > DOM3: Mounting remote filesystems... > > DOM3: > > DOM3: Setting the System Clock using the Hardware Clock as > > reference... > > DOM3: System Clock set. Local time: Tue Feb 10 02:14:37 UTC 2004 > > DOM3: > > DOM3: Cleaning: /tmp /var/lock /var/run. > > DOM3: Initializing random number generator... done. > > DOM3: Recovering nvi editor sessions... done. > > DOM3: INIT: Entering runlevel: 4 > > DOM3: Starting system log daemon: syslogd. > > DOM3: Starting kernel log daemon: klogd. > > DOM3: Starting internet superserver: inetd. > > DOM3: Starting PCMCIA services: module directory > > /lib/modules/2.4.24-xeno/pcmcia not found. > > DOM3: Starting OpenBSD Secure Shell server: sshd. > > DOM3: Starting deferred execution scheduler: atd. > > DOM3: Starting periodic command scheduler: cron. > > DOM3: INIT: no more processes left in this runlevel > > DOM3: Unable to handle kernel paging request at virtual address > > 20000001 > > DOM3: printing eip: > > DOM3: c0007743 > > DOM3: *pde=00000000(00000000) > > DOM3: Oops: 0000 > > DOM3: CPU: 0 > > DOM3: EIP: 0819:[<c0007743>] Not tainted > > DOM3: EFLAGS: 00010202 > > DOM3: eax: 00000001 ebx: 20000001 ecx: c3ebde6c edx: c3ebde6c > > DOM3: esi: c3ebc000 edi: c0114254 ebp: c46c1060 esp: c3ebdce4 > > DOM3: ds: 0821 es: 0821 ss: 0821 > > DOM3: Process cc (pid: 13793, stackpage=c3ebd000)<1> > > DOM3: Stack: 20000001 c3ebc000 c002c728 c3ebc000 c3ebddb0 c0114254 > > ffffffb0 c3ebc000 > > DOM3: c003e789 c3ebde6c c014b5ac c003e314 c3ebde6c 00000000 > > c0114250 00000000 > > DOM3: 00000000 00000000 01082003 8d588810 00000000 00000000 > > 00000000 00000000 > > DOM3: Call Trace: [<c002c728>] [<c003e789>] [<c003e314>] [<c002ccc7>] > > [<c002cf4a>] > > DOM3: [<c002cf61>] [<c0090033>] [<c00914bf>] > > DOM3: > > DOM3: <1>Unable to handle kernel paging request at virtual address > > 20000001 > > DOM3: printing eip: > > DOM3: c000af0f > > DOM3: *pde=00000000(00000000) > > DOM3: Oops: 0002 > > DOM3: CPU: 0 > > DOM3: EIP: 0819:[<c000af0f>] Not tainted > > DOM3: EFLAGS: 00010282 > > DOM3: eax: 20000001 ebx: c485c0a0 ecx: c3ebc264 edx: c3ebc264 > > DOM3: esi: 00000000 edi: 20000001 ebp: 0000000b esp: c3ebdbb4 > > DOM3: ds: 0821 es: 0821 ss: 0821 > > DOM3: Process cc (pid: 13793, stackpage=c3ebd000)<1> > > DOM3: Stack: c485c0a0 00000000 c3ebc000 0000000b 0000000b c000b55f > > 20000001 0000001f > > DOM3: 00000000 cf4227e0 20000001 c0091a87 0000000b 00000000 > > c485c0bc c0096305 > > DOM3: c0129928 c3ebdcb0 00000000 c3ebc000 00000000 20000001 > > c46c1060 00000000 > > DOM3: Call Trace: [<c000b55f>] [<c0091a87>] [<c0096305>] [<c002eb19>] > > [<c0018a25>] > > DOM3: [<c0018c46>] [<c0018feb>] [<c0018ed4>] [<c006e759>] > > [<c0091768>] [<c0007743>] > > DOM3: [<c002c728>] [<c003e789>] [<c003e314>] [<c002ccc7>] > > [<c002cf4a>] [<c002cf61>] > > DOM3: [<c0090033>] [<c00914bf>] > > DOM3: > > DOM3: <1>Unable to handle kernel NULL pointer dereference at virtual > > address 00000001 > > DOM3: printing eip: > > DOM3: c000b623 > > DOM3: *pde=00000000(00000000) > > DOM3: Oops: 0002 > > DOM3: CPU: 0 > > DOM3: EIP: 0819:[<c000b623>] Not tainted > > DOM3: EFLAGS: 00010202 > > DOM3: eax: 00000000 ebx: 00000001 ecx: c3ebc264 edx: c3ebc264 > > DOM3: esi: 00000002 edi: c3ebc000 ebp: 0000000b esp: c3ebdaa0 > > DOM3: ds: 0821 es: 0821 ss: 0821 > > DOM3: Process cc (pid: 13793, stackpage=c3ebd000)<1> > > DOM3: Stack: 0000001f 00000002 20000001 20000001 c0091a87 0000000b > > 00000000 00000002 > > DOM3: c0096305 c0129928 c3ebdb80 00000002 c3ebc000 00000002 > > 20000001 0000000b > > DOM3: 63303039 ffffffff c3ebc000 00000002 38383130 38643538 > > 00030001 64303030 > > DOM3: Call Trace: [<c0091a87>] [<c0096305>] [<c0008996>] [<c000f797>] > > [<c000f991>] > > DOM3: [<c0091768>] [<c000af0f>] [<c000b55f>] [<c0091a87>] > > [<c0096305>] [<c002eb19>] > > DOM3: [<c0018a25>] [<c0018c46>] [<c0018feb>] [<c0018ed4>] > > [<c006e759>] [<c0091768>] > > DOM3: [<c0007743>] [<c002c728>] [<c003e789>] [<c003e314>] > > [<c002ccc7>] [<c002cf4a>] > > DOM3: [<c002cf61>] [<c0090033>] [<c00914bf>] > > DOM3: > > > > > > > > > > On Mon, Feb 09, 2004 at 05:25:00PM -0800, wrote: > > > Hi All, > > > > > > I seem to be able to reproduce a null pointer dereference and paging > > > request errors in 1.2. Can anyone give me any pointers on tracking down > > > what is causing it? > > > > > > This is with a 32Mb virtual domain, running debian woody, NFS root, > > > 256Mb swap in a local VD, while running a process which builds openldap, > > > python2.2.3, and related packages. I'm not sure which package, if any > > > in particular, is causing this; could be just anything that causes a > > > similar workload. This particular set of messages appeared before the > > > virtual domain locked up during the openldap build... > > > > > > Steve > > > > > > > > > DOM26: Unable to handle kernel paging request at virtual address > > > 20000001 > > > DOM26: printing eip: > > > DOM26: c0007743 > > > DOM26: *pde=00000000(00000000) > > > DOM26: Oops: 0000 > > > DOM26: CPU: 0 > > > DOM26: EIP: 0819:[<c0007743>] Not tainted > > > DOM26: EFLAGS: 00010202 > > > DOM26: eax: 00000001 ebx: 20000001 ecx: c0a79e6c edx: c0a79e6c > > > DOM26: esi: c0a78000 edi: c0114254 ebp: c1e5f580 esp: c0a79ce4 > > > DOM26: ds: 0821 es: 0821 ss: 0821 > > > DOM26: Process sh (pid: 10086, stackpage=c0a79000)<1> > > > DOM26: Stack: 20000001 c0a78000 c002c728 c0a78000 c0a79db0 c0114254 > > > ffffffb0 c0a78000 > > > DOM26: c003e789 c0a79e6c c014b5ac c003e314 c0a79e6c 00000000 > > > c0114250 c0a79de8 > > > DOM26: c1419640 80000000 00000000 00000000 00000000 00000000 > > > 00000000 00000000 > > > DOM26: Call Trace: [<c002c728>] [<c003e789>] [<c003e314>] [<c002ccc7>] > > > [<c002cf4a>] > > > DOM26: [<c002cf61>] [<c0090033>] [<c00914bf>] > > > DOM26: > > > DOM26: <1>Unable to handle kernel paging request at virtual address > > > 20000001 > > > DOM26: printing eip: > > > DOM26: c000af0f > > > DOM26: *pde=00000000(00000000) > > > DOM26: Oops: 0002 > > > DOM26: CPU: 0 > > > DOM26: EIP: 0819:[<c000af0f>] Not tainted > > > DOM26: EFLAGS: 00010282 > > > DOM26: eax: 20000001 ebx: c1ed5b20 ecx: c0a78264 edx: c0a78264 > > > DOM26: esi: 00000000 edi: 20000001 ebp: 0000000b esp: c0a79bb4 > > > DOM26: ds: 0821 es: 0821 ss: 0821 > > > DOM26: Process sh (pid: 10086, stackpage=c0a79000)<1> > > > DOM26: Stack: c1ed5b20 00000000 c0a78000 0000000b 0000000b c000b55f > > > 20000001 0000001f > > > DOM26: 00000000 c140d6c0 20000001 c0091a87 0000000b 00000000 > > > c1ed5b3c c0096305 > > > DOM26: c0129928 c0a79cb0 00000000 c0a78000 00000000 20000001 > > > c1e5f580 00000000 > > > DOM26: Call Trace: [<c000b55f>] [<c0091a87>] [<c0096305>] [<c002eb19>] > > > [<c0018a25>] > > > DOM26: [<c0018c46>] [<c0018feb>] [<c0018ed4>] [<c006e759>] > > > [<c0091768>] [<c0007743>] > > > DOM26: [<c002c728>] [<c003e789>] [<c003e314>] [<c002ccc7>] > > > [<c002cf4a>] [<c002cf61>] > > > DOM26: [<c0090033>] [<c00914bf>] > > > DOM26: > > > DOM26: <1>Unable to handle kernel NULL pointer dereference at virtual > > > address 00000001 > > > DOM26: printing eip: > > > DOM26: c000b623 > > > DOM26: *pde=00000000(00000000) > > > DOM26: Oops: 0002 > > > DOM26: CPU: 0 > > > DOM26: EIP: 0819:[<c000b623>] Not tainted > > > DOM26: EFLAGS: 00010202 > > > DOM26: eax: 00000000 ebx: 00000001 ecx: c0a78264 edx: c0a78264 > > > DOM26: esi: 00000002 edi: c0a78000 ebp: 0000000b esp: c0a79aa0 > > > DOM26: ds: 0821 es: 0821 ss: 0821 > > > DOM26: Process sh (pid: 10086, stackpage=c0a79000)<1> > > > DOM26: Stack: 0000001f 00000002 20000001 20000001 c0091a87 0000000b > > > 00000000 00000002 > > > DOM26: c0096305 c0129928 c0a79b80 00000002 c0a78000 00000002 > > > 20000001 0000000b > > > DOM26: 63303039 c101fc58 c0a78000 00000002 c101fc58 ffffffff > > > 00030001 c001e621 > > > DOM26: Call Trace: [<c0091a87>] [<c0096305>] [<c001e621>] [<c001f6c0>] > > > [<c0008996>] > > > DOM26: [<c00200d7>] [<c00204e1>] [<c001464d>] [<c0014c92>] > > > [<c0091768>] [<c000af0f>] > > > DOM26: [<c000b55f>] [<c0091a87>] [<c0096305>] [<c002eb19>] > > > [<c0018a25>] [<c0018c46>] > > > DOM26: [<c0018feb>] [<c0018ed4>] [<c006e759>] [<c0091768>] > > > [<c0007743>] [<c002c728>] > > > DOM26: [<c003e789>] [<c003e314>] [<c002ccc7>] [<c002cf4a>] > > > [<c002cf61>] [<c0090033>] > > > DOM26: [<c00914bf>] > > > DOM26: > > > > > > > > > > > > -- > > > Stephen G. Traugott (KG6HDQ) > > > UNIX/Linux Infrastructure Architect, TerraLuna LLC > > > stevegt@xxxxxxxxxxxxx > > > http://www.stevegt.com -- http://Infrastructures.Org > > > > -- > > Stephen G. Traugott (KG6HDQ) > > UNIX/Linux Infrastructure Architect, TerraLuna LLC > > stevegt@xxxxxxxxxxxxx > > http://www.stevegt.com -- http://Infrastructures.Org > > -- > Stephen G. Traugott (KG6HDQ) > UNIX/Linux Infrastructure Architect, TerraLuna LLC > stevegt@xxxxxxxxxxxxx > http://www.stevegt.com -- http://Infrastructures.Org -- Stephen G. Traugott (KG6HDQ) UNIX/Linux Infrastructure Architect, TerraLuna LLC stevegt@xxxxxxxxxxxxx http://www.stevegt.com -- http://Infrastructures.Org ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |