[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] issues getting more than 16M ram to be used without oopsing. 1.2 and 1.3-unstable
Bad memory is what I also thought the first time it failed to run. It certainly LOOKS like a case of bad ram. However the unknown ACPI boot message has given me memory faults on other mainboards in the past year or so when using Athlon XP 1900+ and 2000+ cpus. That and the hardware's history of use makes me think that it is ok. I'll go ahead and run memtest on it for a few hours to see if it finds anything and report back the results once I have completed the hardware testing. I will also try swapping out the memory in my new workstation and see if that affects it any. Workstation ram: 2 pcs. Micron pc2100 512MB, using it right now under 2.6.4 (Linux enigma 2.6.4-1-k7 #1 Sat Mar 13 22:44:25 EST 2004 i686 GNU/Linux) Debian unstable. Xen server: 1 pc. 512MB pc2100 Micron, 2 pcs. 256M Micron pc2100. I hope it is as simple as the system doesn't liek the memory that is in it I have attempted boting from IDE and nfsroot with the IDE turned off. There is one item during the xen-1.2 bootup with the debian xen package that I did notice. Note the "unknown apic" line where it attempts to initialize the ACPI hardware. I don't know if this can be a source fo the issue. This mesage has given me grief in other systems with similar effects as this until I custom compiled a newer 2.6 kernel that supported the APIC natively. The stock debian 2.4 kernel also seems to have issues with the io APIC. This is why I compiled custom 2.6 kernels which found the APIC and used it ok. This is also what was leading me to believe that the Via chipset may be behaving in a way that Xen isn't expecting and that the Linux 2.4 and 2.6 kernels are avoiding triggering. Maybe Xen pokes around in different ways/areas than the linux kernel did and has found some bad ram and/or APIC flaw that I just never ran into with Linux by sheer chance. The following is a full dump of the system booting up under xen-1.2 with 18m for domain0 as compild by Adam (doogie) for Debian unstable. It loads grub, xen, and xenolinux from hda1. Root is loaded from the nfsroot fileserver. For some reason using 18000 works as well as using < 16384, but memory sizes between 16M and 18m fail, as does anything over 18m. BTW, just smack me if I provide too much and/or the wrong info that yall need. :) ------- bootup dump of Debian xen.deb, 1.2 version. ----- root (hd0,0) Filesystem type is ext2fs, partition type 0x83 kernel /xen.gz dom0_mem=18000 ser_baud=115200 noht watchdog [Multiboot-elf, <0x100000:0x11a138:0x0>, <0x21c000:0x1f4f8:0x29528>, shtab=0 x265280, entry=0x100000] module /xenolinux.gz root=/dev/nfs nfsroot=10.10.10.161:/xen/dom0 rw ip=10.10.1 0.160::10.10.10.1:255.255.255.0:vhost1:eth0:off console=xencons0 [Multiboot-module @ 0x286000, 0x12f8cc bytes] __ __ _ ____ \ \/ /___ _ __ / | |___ \ \ // _ \ '_ \ | | __) | / \ __/ | | | | |_ / __/ /_/\_\___|_| |_| |_(_)_____| http://www.cl.cam.ac.uk/netos/xen University of Cambridge Computer Laboratory Xen version 1.2 (root@xxxxxxxxxxxxxxxxxxxxx) (gcc version 3.3.3 (Debian)) Thu M ar 4 12:56:20 CST 2004 Initialised all memory on a 1022MB machine Reading BIOS drive-info tables at 0xf95f0 and 0xfe819 CPU0: Before vendor init, caps: 0383fbff c1c3fbff 00000000, vendor = 2 CPU caps: 0383fbff c1c3fbff 00000000 00000000 found SMP MP-table at 000f60c0 Memory Reservation 0xf60c0, 4096 bytes Memory Reservation 0xf0c00, 4096 bytes ACPI: Searched entire block, no RSDP was found. ACPI: RSDP located at physical address fc4f7ac0 RSD PTR v0 [KT600 ] __va_range(0x3fef3000, 0x68): idx=8 mapped at ffff6000 ACPI table found: RSDT v1 [KT600 AWRDACPI 16944.11825] __va_range(0x3fef3040, 0x24): idx=8 mapped at ffff6000 __va_range(0x3fef3040, 0x74): idx=8 mapped at ffff6000 ACPI table found: FACP v1 [KT600 AWRDACPI 16944.11825] __va_range(0x3fef7a00, 0x24): idx=8 mapped at ffff6000 __va_range(0x3fef7a00, 0x5a): idx=8 mapped at ffff6000 ACPI table found: APIC v1 [KT600 AWRDACPI 16944.11825] __va_range(0x3fef7a00, 0x5a): idx=8 mapped at ffff6000 LAPIC (acpi_id[0x0000] id[0x0] enabled[1]) CPU 0 (0x0000) enabledProcessor #0 Pentium(tm) Pro APIC version 16 IOAPIC (id[0x2] address[0xfec00000] global_irq_base[0x0]) INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0]) INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x0] trigger[0x0]) LAPIC_NMI (acpi_id[0x0000] polarity[0x1] trigger[0x1] lint[0x1]) 1 CPUs total Local APIC address fee00000 Enabling the CPU's according to the ACPI table Intel MultiProcessor Specification v1.4 Virtual Wire compatibility mode. OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000 Processor #0 Pentium(tm) Pro APIC version 17 I/O APIC #2 Version 17 at 0xFEC00000. Enabling APIC mode: Flat.Using 1 I/O APICs Processors: 2 Initialising domains Initialising schedulers Initializing CPU#0 Detected 1852.069 MHz processor. CPU0: Before vendor init, caps: 0383fbff c1c3fbff 00000000, vendor = 2 CPU caps: 0383fbff c1c3fbff 00000000 00000000 CPU0 booted enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Error: only one processor found. ENABLING IO-APIC IRQs Setting 2 in the phys_id_present_map ...changing IO-APIC physical APIC ID to 2 ... ok. init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-10, 2-11, 2-16, 2-18, 2-19, 2-20, 2-21, 2-22 not co nnected. ..TIMER: vector=0x41 pin1=2 pin2=0 number of MP IRQ sources: 16. number of IO-APIC #2 registers: 24. testing the IO APIC....................... IO APIC #2...... .... register #00: 02000000 ....... : physical APIC id: 02 .... register #01: 00178003 ....... : max redirection entries: 0017 ....... : PRQ implemented: 1 ....... : IO APIC version: 0003 An unexpected IO-APIC was found. If this kernel release is less than three months old please report this to linux-smp@xxxxxxxxxxxxxxx .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 000 00 1 0 0 0 0 0 0 00 01 0FF 0F 0 0 0 0 0 1 1 49 02 001 01 0 0 0 0 0 1 1 41 03 0FF 0F 0 0 0 0 0 1 1 51 04 0FF 0F 0 0 0 0 0 1 1 59 05 0FF 0F 0 0 0 0 0 1 1 61 06 0FF 0F 0 0 0 0 0 1 1 69 07 0FF 0F 0 0 0 0 0 1 1 71 08 0FF 0F 0 0 0 0 0 1 1 79 09 0FF 0F 0 0 0 0 0 1 1 81 0a 000 00 1 0 0 0 0 0 0 00 0b 000 00 1 0 0 0 0 0 0 00 0c 0FF 0F 0 0 0 0 0 1 1 89 0d 0FF 0F 0 0 0 0 0 1 1 91 0e 0FF 0F 0 0 0 0 0 1 1 99 0f 0FF 0F 0 0 0 0 0 1 1 A1 10 000 00 1 0 0 0 0 0 0 00 11 0FF 0F 1 1 0 1 0 1 1 A9 12 000 00 1 0 0 0 0 0 0 00 13 000 00 1 0 0 0 0 0 0 00 14 000 00 1 0 0 0 0 0 0 00 15 000 00 1 0 0 0 0 0 0 00 16 000 00 1 0 0 0 0 0 0 00 17 0FF 0F 1 1 0 1 0 1 1 B1 IRQ to pin mappings: IRQ0 -> 0:2 IRQ1 -> 0:1 IRQ3 -> 0:3 IRQ4 -> 0:4 IRQ5 -> 0:5 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ9 -> 0:9 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ14 -> 0:14 IRQ15 -> 0:15 IRQ17 -> 0:17 IRQ23 -> 0:23 .................................... done. Using local APIC timer interrupts. Calibrating APIC timer for CPU0... ..... CPU speed is 1852.1073 MHz. ..... Bus speed is 336.7467 MHz. ..... bus_scale = 0x000158E5 ACT: Initialising Accurate timers Time init: .... System Time: 11615885ns .... cpu_freq: 00000000:6E645690 .... scale: 00000001:14728FC3 .... Wall Clock: 1081028483s 0us Start schedulers Testing NMI watchdog --- CPU#0 okay. PCI: PCI BIOS revision 2.10 entry at 0xfb8d0, last bus=1 PCI: Using configuration type 1 PCI: Probing PCI hardware PCI: Using IRQ router VIA [1106/3177] at 00:11.0 PCI->APIC IRQ transform: (B0,I9,P0) -> 17 PCI->APIC IRQ transform: (B0,I18,P0) -> 23 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html 00:09.0: 3Com PCI 3c595 Vortex 100baseTx at 0xd000. Vers LK1.1.16 00:09.0: Overriding PCI latency timer (CFLT) setting of 32, new value is 248. Uniform Multi-Platform E-IDE driver Revision: 6.31 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx VP_IDE: IDE controller on PCI bus 00 dev 89 VP_IDE: detected chipset, but driver not compiled in! PCI: No IRQ known for interrupt pin A of device 00:11.1. Probably buggy MP table . VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xe000-0xe007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xe008-0xe00f, BIOS settings: hdc:pio, hdd:pio hda: ST360021A, ATA DISK drive hdb: TOSHIBA CD-ROM XM-5602B, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdb: ATAPI 8X CD-ROM drive, 256kB Cache Uniform CD-ROM driver Revision: 3.12 hda: 117231408 sectors (60022 MB) w/2048KiB Cache, CHS=7297/255/63 PIO (slow!) SCSI subsystem driver Revision: 1.00 Red Hat/Adaptec aacraid driver (1.1.2 Mar 4 2004 12:55:28) Device eth0 opened and ready for use. DOM0: Guest OS virtual load address is c0000000 DOM0: xen_console_init DOM0: Linux version 2.4.25-xeno-p2 (adam@gradall) (gcc version 3.3.3 (Debian)) # 1 Thu Mar 4 12:49:07 CST 2004 DOM0: On node 0 totalpages: 4500 DOM0: zone(0): 4096 pages. DOM0: zone(1): 404 pages. DOM0: zone(2): 0 pages. DOM0: Kernel command line: /xenolinux.gz root=/dev/nfs nfsroot=10.10.10.161:/xen /dom0 rw ip=10.10.10.160::10.10.10.1:255.255.255.0:vhost1:eth0:off console=xenco ns0 DOM0: Initializing CPU#0 DOM0: Xen reported: 1852.069 MHz processor. DOM0: Console: colour VGA+ 80x25 DOM0: Calibrating delay loop... 14837.35 BogoMIPS DOM0: Memory: 16348k/18000k available (978k kernel code, 1652k reserved, 166k da ta, 52k init, 0k highmem) DOM0: Dentry cache hash table entries: 4096 (order: 3, 32768 bytes) DOM0: Inode cache hash table entries: 2048 (order: 2, 16384 bytes) DOM0: Mount cache hash table entries: 512 (order: 0, 4096 bytes) DOM0: Buffer cache hash table entries: 1024 (order: 0, 4096 bytes) DOM0: Page-cache hash table entries: 8192 (order: 3, 32768 bytes) DOM0: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) DOM0: CPU: L2 Cache: 512K (64 bytes/line) DOM0: CPU: AMD Athlon(tm) XP 2500+ stepping 00 DOM0: POSIX conformance testing by UNIFIX DOM0: Linux NET4.0 for Linux 2.4 DOM0: Based upon Swansea University Computer Society NET3.039 DOM0: Initializing RT netlink socket DOM0: Starting kswapd DOM0: VFS: Disk quotas vdquot_6.5.1 DOM0: register_swap_method: method blkdev DOM0: register_swap_method: method blkdev file DOM0: register_swap_method: method nfs file DOM0: Xeno console successfully installed DOM0: Successfully installed virtual firewall/router interface DOM0: Starting Xeno Balloon driver DOM0: pty: 256 Unix98 ptys configured DOM0: Serial driver version 5.05c (2001-07-08) with no serial options enabled DOM0: ttyS00 at 0x03f8 (irq = 4) is a 16550A DOM0: Real Time Clock Driver v1.10f DOM0: Could not allocate block update interrupt DOM0: Initializing Cryptographic API DOM0: NET4: Linux TCP/IP 1.0 for NET4.0 DOM0: IP Protocols: ICMP, UDP, TCP, IGMP DOM0: IP: routing cache hash table of 512 buckets, 4Kbytes DOM0: TCP: Hash tables configured (established 1024 bind 2048) DOM0: IP-Config: Complete: DOM0: device=eth0, addr=10.10.10.160, mask=255.255.255.0, gw=10.10.10.1, DOM0: host=vhost1, domain=, nis-domain=(none), DOM0: bootserver=255.255.255.255, rootserver=10.10.10.161, rootpath= DOM0: NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. DOM0: Looking up port of RPC 100003/2 on 10.10.10.161 DOM0: Looking up port of RPC 100005/1 on 10.10.10.161 DOM0: VFS: Mounted root (nfs filesystem). DOM0: Freeing unused kernel memory: 52k freed DOM0: INIT: version 2.85 booting DOM0: Hello World! DOM0: mount: can't find / in /etc/fstab or /etc/mtab DOM0: Hello World. DOM0: Loading /etc/console/boottime.kmap.gz DOM0: Activating swap. DOM0: Calculating module dependencies... done. DOM0: Loading modules: via-rhine modprobe: Can't locate module via-rhine DOM0: 3c59x modprobe: Can't locate module 3c59x DOM0: DOM0: Checking all file systems... DOM0: fsck 1.35 (28-Feb-2004) DOM0: Setting kernel variables.. DOM0: Mounting local filesystems... DOM0: /var/run /var/lock. DOM0: Running 0dns-down to make sure resolv.conf is ok...done. DOM0: Cleaning: /etc/network/ifstate. DOM0: Setting up IP spoofing protection: rp_filter. DOM0: Configuring network interfaces...SIOCADDRT: File exists DOM0: done. DOM0: /etc/rcS.d/S41hostname.dhcp: line 1: host: command not found DOM0: Starting portmap daemon: portmap. DOM0: Loading the saved-state of the serial devices... DOM0: /dev/ttyS0 at 0x03f8 (irq = 4) is a 16550A DOM0: Initializing random number generator...done. DOM0: Recovering nvi editor sessions... done. DOM0: /etc/init.d/rcS: line 54: /etc/rcS.d/S70xfree86-common: Permission denied DOM0: INIT: Entering runlevel: 2 DOM0: Starting system log daemon: syslogd. DOM0: Starting kernel log daemon: klogd. DOM0: Starting portmap daemon: portmap. DOM0: Starting MTA: 2004-04-03 15:41:35 Failed to open configuration file /etc/e xim/exim.conf DOM0: Starting internet superserver: inetd. DOM0: Starting OpenBSD Secure Shell server: sshd. DOM0: Starting NFS common utilities: statd. DOM0: Starting deferred execution scheduler: atd. DOM0: Starting periodic command scheduler: cron. DOM0: INIT: no more processes left in this runlevel DOM0: INIT: Switching to runlevel: 6 DOM0: Stopping periodic command scheduler: cron. DOM0: Stopping MTA: No /usr/lib/exim/exim3 found running; none killed. DOM0: exim. DOM0: Stopping internet superserver: inetd. DOM0: Stopping OpenBSD Secure Shell server: sshd. DOM0: Stopping NFS common utilities: statd. DOM0: Stopping deferred execution scheduler: atd. DOM0: Stopping kernel log daemon: klogd. DOM0: Stopping system log daemon: syslogd. DOM0: Sending all processes the TERM signal...done. DOM0: Sending all processes the KILL signal...done. DOM0: Saving random seed...done. DOM0: Unmounting remote and non-toplevel virtual filesystems...done. DOM0: NOT deconfiguring network interfaces: / is an NFS mount DOM0: Deactivating swap...done. DOM0: Unmounting local filesystems...done. DOM0: Rebooting... Restarting system. Domain 0 killed: rebooting machine! ---------------------- Here is some of the machine's work history that will explain why i'm so confident that the hardware is ok. Prior to retasking this machine as a Xen server it was my workstation. I have run a variety of kernels on it in the pat and even pushed the system memory consumption quite hard (it used to have a triple head xinerama setup on it.) I've run 2.4.21, 22, 23 and 24, 2.6.0-test9, 0, 1, 2, 3 and 4 on it (all custom compiles, 2.6 compiles used highmem 2G) when it was my workstation. I have pushed the memory consumption to the point of causing heavy swapping (700M+ of swap used, < 100M cache used, <16M free memory). Prior to emailing the Xen list I swapped the ram sticks out with another server that has run flawlessly for the past 12 months as a production mail server (50,000 email accounts, 400+ simultaneous receiving sendmail processes). Wehn the mainboard, cpu, disk, and power supply were my workstation I did MANY intensive compiling sessions (best method to find bad ram besides badram and memtest), sometimes several compiles in parallel. -- Brian Wolfe | Phone 1-(214)-764-1204 President, | Email brianw@xxxxxxxxxxxx TerraBox.com Inc. | pub 1024D/73C5A2DF 2003-03-18 Brian Wolfe <brianw@xxxxxxxxxxxx> Key fingerprint = 050E 5E3C CF65 4C1E A183 F48F E3E3 5B22 73C5 A2DF sub 1024g/BB87A3DD 2003-03-18 Keir Fraser said: > >> Now, this machine has been used for aprox 5 months now without any >> glitches or oopses. So i'm 99.9999% certain that the hardware is good. >> >> I'm using an NFS root since the ide is only in pio mode (and to >> eliminate >> it's use toher than to boot the kernels). >> >> Any insights? >> >> If necessary for debuging, I can provide access to the hardware via >> serial >> console. :) >> >> Thanks for any help yall can give! > > The crashes look quite random -- I don't think this is a bug in the > core of Xen. The two most likely possibilities are that you have duff > memory or that a misconfigured device is trashing memory. I definitely > wouldn't discount the former, even though native x86 Linux has been > running okay -- crashes can be very sensitive to memory layout. > > It might be worth running a few rounds of memtest on the machine, or > swapping the memory, or trying to boot Xen on another identical box. > > If that doesn't cure it then try swapping out or disabling > hardware. For example, boot off local disc and disable networking > ('ifname=dummy'). Since the cause is most likely hardware-related, the > best approach is to isolate the problem hardware. > > -- Keir > > PS. If you build your own Xen/Xenolinux then keep the build trees > around (or at least, for Xenolinux, the 'vmlinux' file). I can't find > suitable image files for the tarballs on the Xen website, and without > them it is very difficult to determine anything from crash dumps. > ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |