[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.0.0x allows for data corruption in Dom0



On Mon, Mar 08, 2010 at 03:22:32PM -0800, Daniel Stodden wrote:
> On Sun, 2010-03-07 at 11:12 -0500, Pasi Kärkkäinen wrote:
> > On Sun, Mar 07, 2010 at 02:39:09PM +0000, Keir Fraser wrote:
> > > On 07/03/2010 14:36, "Pasi Kärkkäinen" <pasik@xxxxxx> wrote:
> > > 
> > > >> Tried a few times and no luck reproducing so far. I hope some other 
> > > >> people
> > > >> on the list also will give it a go, since it's so easy to try it out.
> > > >> 
> > > > 
> > > > I'm able to reproduce this with xen/master 2.6.31.6 dom0 kernel (from
> > > > 2010-02-20),
> > > > but I'm not able to reproduce it with the current xen/stable 2.6.32.9.
> > > > 
> > > > I'll try with the most recent 2.6.31.6 dom0 kernel aswell..
> > > 
> > > Thanks Pasi!
> > > 
> > 
> > It seems to happen with the latest xen/master 2.6.31.6 aswell!
> 
> Does this look to you like we're corrupting memory or on-disk storage?
> 
> E.g. does a
> $ dd if=/dev/zero bs=1M | hexdump -C 
> have the same issue?
> 
> I have some initial trouble with the idea that zero.read() in a PV domU
> somehow unlearned to scrub a 1M user buffer.
> 

My setup:

Dom0 distro: Fedora 12
Xen hypervisor: 4.0.0-rc5 x86_64
Dom0 kernel: latest xen/master 2.6.31.6 x86_64

Xen hypervisor boot options in grub.conf: dom0_mem=1G loglvl=all 
guest_loglvl=all
Dom0 kernel boot options in grub.conf: ro root=/dev/mapper/vg_f12test-lv01 
SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=fi nomodeset

Steps to reproduce the bug:

1. Reboot the system
2. Start a dummy guest using the domU kernel (rpm) provided in the original 
bugreport:

# xm create -c /dev/null memory=400 
kernel="vmlinuz-2.6.31.9-1.2.82.xendom0.fc12.x86_64" extra="rootdelay=1000"

3. run in dom0:

# dd if=/dev/zero of=test bs=1M count=10000 && sync && sync && xxd test | grep 
-v "0000 0000 0000 0000 0000 0000 0000 0000"
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 233.621 s, 44.9 MB/s

6039000: 1000 0000 0000 0000 c0b0 ffff 0300 0000  ................
6039010: 1d5e 06ab b502 0000 1eb2 27b5 ff00 0000  .^........'.....
2dfe9000:3000 0000 0000 0000 a43c 7687 0e00 0000  0........<v.....
2dfe9010:cfc1 ba64 b902 0000 1eb2 27b5 ff00 0000  ...d......'.....
50685000:4800 0000 0000 0000 f954 0f6d 1600 0000  H........T.m....
50685010:5b1d 0230 bc02 0000 1eb2 27b5 ff00 0000  [..0......'.....
743f9000:6200 0000 0000 0000 e0e2 1ffb 1e00 0000  b...............
743f9010:acc3 e436 bf02 0000 1eb2 27b5 ff00 0000  ...6......'.....

As you can see, very easy to reproduce.

Now, I "xm destroy" the domU, run "sync" and "echo 3 > 
/proc/sys/vm/drop_caches" in dom0,
and then re-start the dummy domU, and try the other method as requested by 
Daniel:

# dd if=/dev/zero bs=1M | hexdump -C
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
^C20984+0 records in
20983+0 records out
22002270208 bytes (22 GB) copied, 206.353 s, 107 MB/s

So that method didn't show the corruption..
Now immediately after (no domU restart) let's try to reproduce again with the 
dd + xxd method:

# dd if=/dev/zero of=test bs=1M count=10000 && sync && sync && xxd test | grep 
-v "0000 0000 0000 0000 0000 0000 0000 0000"
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 258.85 s, 40.5 MB/s
7dc2000: 5a02 0000 0000 0000 760d d90c c500 0000  Z.......v.......
7dc2010: 3785 8def 8003 0000 1eb2 27b5 ff00 0000  7.........'.....
2dc0d000:7802 0000 0000 0000 ec70 d8eb ce00 0000  x........p......
2dc0d010:6fb9 a66d 8403 0000 1eb2 27b5 ff00 0000  o..m......'.....

So it seems to be related to disk IO in dom0? 

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.