[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] file corruption!!!

  • To: <xen-devel@xxxxxxxxxxxxxxxxxxxxx>
  • From: James Harper <JamesH@xxxxxxxxxxxxxxxx>
  • Date: Mon, 19 Jul 2004 11:02:09 +1000
  • Delivery-date: Mon, 19 Jul 2004 02:04:53 +0100
  • List-id: List for Xen developers <xen-devel.lists.sourceforge.net>
  • Thread-index: AcRtIBETaVI4xD4AR5SRcCKCcCxpigAC/MGm
  • Thread-topic: [Xen-devel] file corruption!!!

I just tried another bk pull + make world, and it failed because it couldn't gunzip linux-2.4.26.tar.gz. I tried it manually and sure enough it failed. 'xm list' etc just seg faulted too.
After a reboot though, the file was fine again, so the corruption in this case was a read error not a write error. I'm assuming that if I had done enough io to flush any buffers and then tried to gunzip the file again it probably would have worked.
Just prior to this I had run a little C program which would just try and allocate memory in 1mb chunks until it was killed.. After reboot I tried the same thing again and it appears to be staying up okay now, unfortunately. It almost seems like I only start to get errors after a day or so uptime and a fair bit of I/O.
Curiously though, the first time I ran my memory exhausting program, all my xenU domains restarted...
Since starting this email I have managed to induce corruption again, i'll reboot and try it again without starting any other domains.
The server is a Compaq ProLiant 1600 2x550mhz P3 with 768mb memory. All the memory is ECC and up until I acquired it for Linux purposes, it was running as another company's main Windows server, so I wouldn't have suspected a hardware issue.
I'll follow up shortly hopefully with some instructions on inducing the corruption on this server for anyone else to try to see if we have a general problem.
There haven't been any fixes in the last 2 days that would correct this problem have there? I'm a few days out of date i think.

From: James Harper
Sent: Mon 19/07/2004 9:36 AM
To: xen-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: RE: [Xen-devel] file corruption!!!

I'm not in a position to test this, but is it possible that the corruption problem could manifest itself after an out of memory condition? When I first noticed the corruption I rebooted as quickly as possible so it didn't continue and so didn't check, but it's possible that it ran out of memory first. I guess I could test this but don't really want to do anything to risk corruption any further :)
speaking of memory, I have 3 domains running currently, 0 + 2U, all declared with 128mb memory, but xm list shows this:
Dom  Name             Mem(MB)  CPU  State  Time(s)
0    Domain-0             119    0  r----   1293.0
6    gaia                 127    1  -b---     81.9
7    mail2                126    0  -b---   1597.9
'free' under mail2 and gaia shows 128124 as the total amount of memory.
I appreciate that maybe something about dom0 means that it shows something different, but why would the other two report different amounts of memory when they both have the same amount??? Both are running identical kernels.

From: Chris Andrews
Sent: Mon 19/07/2004 8:43 AM
To: xen-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [Xen-devel] file corruption!!!

On 18 Jul 2004, at 18:48, Ian Pratt wrote:

>> On 17 Jul 2004, at 21:21, Ian Pratt wrote:
>>> It would be very interesting to hear whether you get the problem
>>> with the 2.6.7 xen linux. It might give us a clue as to whether
>>> the problem is with the backend blk driver or within the domain
>>> itself (the 2.6.7 implementation is completely different).
>> I can certainly give the 2.6.7 guest another try. I did have it
>> booting, but I didn't persist with it long enough to tell if there was
>> fs corruption -- there seemed to be issues loading modules, and when I
>> compiled everything in, I got a gpf when racoon tried to use a PF_KEY
>> socket. I'll try and get some useful dumps for both these problems.
> I haven't tried loading modules, but I can't think why it
> wouldn't work (assuming the mechanism is basically the same as
> 2.4).

It's different enough to need new userspace tools. The symptoms of 
failure are a GPF, and the userspace process stuck in D (be it insmod 
or lsmod). The results of feeding the GPF to ksymoops are below (I 
hesitate to say it's actually decoded).

> BTW:  what's racoon, and what's a PF_KEY socket?

racoon is the ISAKMP daemon used with the 2.6 kernel's KAME IPSec code. 
It uses a PF_KEY socket to communicate with the kernel. I've 
successfully used it in a 2.4 guest.


No modules in ksyms, skipping objects
No ksyms, skipping lsmod
CPU:    0
EIP:    0061:[<c01471a7>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246   (2.6.7-xenU)
eax: 00000600   ebx: c5400000   ecx: 00000001   edx: 00000600
esi: c0102c54   edi: c5089000   ebp: c5087000   esp: c04b1ec4
ds: 0069   es: 0069   ss: 0069
Stack: c0102c50 c5087000 00002000 c122c6a8 c122c6e0 00000001 c01473f8 
        c5087000 fffffffe c0147491 c5087000 00000000 c5055c19 c5084380 
        fffffffe c5084380 c014753e c5087000 00000001 c012d9c3 c5087000 
Call Trace:
  c04b1ed0: [<c01473f8>]  c04b1ee0: [<c0147491>]  c04b1f00: [<c014753e>] 
  c04b1f0c: [<c012d9c3>]  c04b1f38: [<c02da440>]  c04b1f94: [<c012dc5d>] 
  c04b1fb4: [<c010a663>]
Code: 0f 22 e2 0f 20 d9 0f 22 d9 0f 22 e0 83 c4 0c 5b 5e 5f c3 e8

 >>EIP; c01471a7 <unmap_vm_area+5d/80>   <=====

 >>ebx; c5400000 <pg0+50c8000/3bcc5000>
 >>esi; c0102c54 <swapper_pg_dir+c54/1000>
 >>edi; c5089000 <pg0+4d51000/3bcc5000>
 >>ebp; c5087000 <pg0+4d4f000/3bcc5000>
 >>esp; c04b1ec4 <pg0+179ec4/3bcc5000>

Code;  c01471a7 <unmap_vm_area+5d/80>
00000000 <_EIP>:
Code;  c01471a7 <unmap_vm_area+5d/80>   <=====
    0:   0f 22 e2                  mov    %edx,%cr4   <=====
Code;  c01471aa <unmap_vm_area+60/80>
    3:   0f 20 d9                  mov    %cr3,%ecx
Code;  c01471ad <unmap_vm_area+63/80>
    6:   0f 22 d9                  mov    %ecx,%cr3
Code;  c01471b0 <unmap_vm_area+66/80>
    9:   0f 22 e0                  mov    %eax,%cr4
Code;  c01471b3 <unmap_vm_area+69/80>
    c:   83 c4 0c                  add    $0xc,%esp
Code;  c01471b6 <unmap_vm_area+6c/80>
    f:   5b                        pop    %ebx
Code;  c01471b7 <unmap_vm_area+6d/80>
   10:   5e                        pop    %esi
Code;  c01471b8 <unmap_vm_area+6e/80>
   11:   5f                        pop    %edi
Code;  c01471b9 <unmap_vm_area+6f/80>
   12:   c3                        ret
Code;  c01471ba <unmap_vm_area+70/80>
   13:   e8 00 00 00 00            call   18 <_EIP+0x18>

This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.