[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
Hi Shaohua, Thanks for your reply.Let me explain my "huge". For example, if I'm making a low rate i/o stream, I don't get a crash (<1MB written / sec) with random i/o, but if I'm making a random I/O of about 20MB/sec, the kernel crashes in a few minutes (for example, making an rsync, or even synchronising my DRBD stack is causing the crash). I don't know if this can help, but in most of case, when the kernel crashes, after a reboot, my raid 5 stack is re-synchronizing. I'm not able to reproduce the crash with a raw RAID5 stack (with dd/fio ...). It seems I need to stack filesystems to help reproduce it:Here is a configuration test, command lines to explain (the way I'm able to reproduce the crash). Everything is done in dom0. - mdadm --create /dev/md10 --raid-devices=3 --level=5 /dev/sdc1 /dev/sdd1 /dev/sde1 - mkfs.btrfs /dev/md10 - mkdir /tmp/btrfs /mnt/XenVM /tmp/ext4 - mount /dev/md10 /tmp/btrfs - btrfs subvolume create /tmp/btrfs/XenVM - umount /tmp/btrfs - mount /dev/md10 /mnt/XenVM -osubvol=XenVM - truncate /mnt/XenVM/VMTestFile.dat -s 800G - mkfs.ext4 /mnt/XenVM/VMTestFile.dat - mount /mnt/XenVM/VMTestFile.dat /tmp/ext4 -> Doing this, doesn't seem to crash the kernel :fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600 --group_reporting --filename=/mnt/XenVM/Fio.dat -> Doing this, is crashing the kernel in a few minutes :fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite --rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600 --group_reporting --filename=/tmp/ext4/ext4.dat Note : --direct=1 or --direct=0 doesn't seem to change the behaviour. Also having the raid 5 stack re-synchronizing or already synchronized, doesn't change the behaviour. Here another "crash" : http://pastebin.com/uqLzL4fnRegarding your patch, I can't find it. Is it the one sent by Konstantin Khlebnikov ? Do you want the "ext4.dat" fio file ? It will be really difficult for me to provide it to you as I've only a poor ADSL network connection. Thanks for your help, MasterPrenium Le 04/01/2017 à 23:30, Shaohua Li a écrit : On Fri, Dec 23, 2016 at 07:25:56PM +0100, MasterPrenium wrote:Hello Guys, I've having some trouble on a new system I'm setting up. I'm getting a kernel BUG message, seems to be related with the use of Xen (when I boot the system _without_ Xen, I don't get any crash). Here is configuration : - 3x Hard Drives running on RAID 5 Software raid created by mdadm - On top of it, DRBD for replication over another node (Active/passive cluster) - On top of it, a BTRFS FileSystem with a few subvolumes - On top of it, XEN VMs running. The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync for example) on the RAID5 stack. I've to reset system to make it work again.what did you mean 'huge' I/O (20M/s)? Is it possible you can reproduce the issue with a raw raid5 raid? It would be even better if you can give me a fio job file with the issue, so I can easily debug it. also please check if upstream patch (e8d7c33 md/raid5: limit request size according to implementation limits) helps. Thanks, Shaohua _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx https://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |