[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode



Hi Shaohua,

Thanks for your reply.

Let me explain my "huge". For example, if I'm making a low rate i/o stream, I don't get a crash (<1MB written / sec) with random i/o, but if I'm making a random I/O of about 20MB/sec, the kernel crashes in a few minutes (for example, making an rsync, or even synchronising my DRBD stack is causing the crash). I don't know if this can help, but in most of case, when the kernel crashes, after a reboot, my raid 5 stack is re-synchronizing.

I'm not able to reproduce the crash with a raw RAID5 stack (with dd/fio ...).

It seems I need to stack filesystems to help reproduce it:

Here is a configuration test, command lines to explain (the way I'm able to reproduce the crash). Everything is done in dom0. - mdadm --create /dev/md10 --raid-devices=3 --level=5 /dev/sdc1 /dev/sdd1 /dev/sde1
- mkfs.btrfs /dev/md10
- mkdir /tmp/btrfs /mnt/XenVM /tmp/ext4
- mount /dev/md10 /tmp/btrfs
- btrfs subvolume create /tmp/btrfs/XenVM
- umount /tmp/btrfs
- mount /dev/md10 /mnt/XenVM -osubvol=XenVM
- truncate /mnt/XenVM/VMTestFile.dat -s 800G
- mkfs.ext4 /mnt/XenVM/VMTestFile.dat
- mount /mnt/XenVM/VMTestFile.dat /tmp/ext4

-> Doing this, doesn't seem to crash the kernel :
fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600 --group_reporting --filename=/mnt/XenVM/Fio.dat

-> Doing this, is crashing the kernel in a few minutes :
fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600 --group_reporting --filename=/tmp/ext4/ext4.dat

Note : --direct=1 or --direct=0 doesn't seem to change the behaviour. Also having the raid 5 stack re-synchronizing or already synchronized, doesn't change the behaviour.

Here another "crash" : http://pastebin.com/uqLzL4fn

Regarding your patch, I can't find it. Is it the one sent by Konstantin Khlebnikov ?

Do you want the "ext4.dat" fio file ? It will be really difficult for me to provide it to you as I've only a poor ADSL network connection.

Thanks for your help,

MasterPrenium

Le 04/01/2017 à 23:30, Shaohua Li a écrit :
On Fri, Dec 23, 2016 at 07:25:56PM +0100, MasterPrenium wrote:
Hello Guys,

I've having some trouble on a new system I'm setting up. I'm getting a kernel 
BUG message, seems to be related with the use of Xen (when I boot the system 
_without_ Xen, I don't get any crash).
Here is configuration :
- 3x Hard Drives running on RAID 5 Software raid created by mdadm
- On top of it, DRBD for replication over another node (Active/passive cluster)
- On top of it, a BTRFS FileSystem with a few subvolumes
- On top of it, XEN VMs running.

The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync for 
example) on the RAID5 stack.
I've to reset system to make it work again.
what did you mean 'huge' I/O (20M/s)? Is it possible you can reproduce the
issue with a raw raid5 raid? It would be even better if you can give me a fio
job file with the issue, so I can easily debug it.

also please check if upstream patch (e8d7c33 md/raid5: limit request size
according to implementation limits) helps.

Thanks,
Shaohua


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
https://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.