[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

To: Shaohua Li <shli@xxxxxxxxxx>
From: MasterPrenium <masterprenium.lkml@xxxxxxxxx>
Date: Thu, 5 Jan 2017 15:16:53 +0100
Cc: linux-raid@xxxxxxxxxxxxxxx, xen-users@xxxxxxxxxxxxx, "MasterPrenium@xxxxxxxxx" <MasterPrenium@xxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 05 Jan 2017 14:18:09 +0000
List-id: Xen user discussion <xen-users.lists.xen.org>

Hi Shaohua,

Thanks for your reply.

Let me explain my "huge". For example, if I'm making a low rate i/ostream, I don't get a crash (<1MB written / sec) with random i/o, but ifI'm making a random I/O of about 20MB/sec, the kernel crashes in a fewminutes (for example, making an rsync, or even synchronising my DRBDstack is causing the crash).I don't know if this can help, but in most of case, when the kernelcrashes, after a reboot, my raid 5 stack is re-synchronizing.

I'm not able to reproduce the crash with a raw RAID5 stack (with dd/fio...).


It seems I need to stack filesystems to help reproduce it:

Here is a configuration test, command lines to explain (the way I'm ableto reproduce the crash). Everything is done in dom0.- mdadm --create /dev/md10 --raid-devices=3 --level=5 /dev/sdc1/dev/sdd1 /dev/sde1

- mkfs.btrfs /dev/md10
- mkdir /tmp/btrfs /mnt/XenVM /tmp/ext4
- mount /dev/md10 /tmp/btrfs
- btrfs subvolume create /tmp/btrfs/XenVM
- umount /tmp/btrfs
- mount /dev/md10 /mnt/XenVM -osubvol=XenVM
- truncate /mnt/XenVM/VMTestFile.dat -s 800G
- mkfs.ext4 /mnt/XenVM/VMTestFile.dat
- mount /mnt/XenVM/VMTestFile.dat /tmp/ext4

-> Doing this, doesn't seem to crash the kernel :

fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite--rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600--group_reporting --filename=/mnt/XenVM/Fio.dat


-> Doing this, is crashing the kernel in a few minutes :

fio --name=randwrite --ioengine=libaio --iodepth=1 --rw=randwrite--rwmixwrite=95 --bs=1M --direct=1 --size=80G --numjobs=8 --runtime=600--group_reporting --filename=/tmp/ext4/ext4.dat

Note : --direct=1 or --direct=0 doesn't seem to change the behaviour.Also having the raid 5 stack re-synchronizing or already synchronized,doesn't change the behaviour.


Here another "crash" : http://pastebin.com/uqLzL4fn

Regarding your patch, I can't find it. Is it the one sent by KonstantinKhlebnikov ?

Do you want the "ext4.dat" fio file ? It will be really difficult for meto provide it to you as I've only a poor ADSL network connection.


Thanks for your help,

MasterPrenium

Le 04/01/2017 à 23:30, Shaohua Li a écrit :

On Fri, Dec 23, 2016 at 07:25:56PM +0100, MasterPrenium wrote:

Hello Guys,

I've having some trouble on a new system I'm setting up. I'm getting a kernel 
BUG message, seems to be related with the use of Xen (when I boot the system 
_without_ Xen, I don't get any crash).
Here is configuration :
- 3x Hard Drives running on RAID 5 Software raid created by mdadm
- On top of it, DRBD for replication over another node (Active/passive cluster)
- On top of it, a BTRFS FileSystem with a few subvolumes
- On top of it, XEN VMs running.

The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync for 
example) on the RAID5 stack.
I've to reset system to make it work again.

what did you mean 'huge' I/O (20M/s)? Is it possible you can reproduce the
issue with a raw raid5 raid? It would be even better if you can give me a fio
job file with the issue, so I can easily debug it.

also please check if upstream patch (e8d7c33 md/raid5: limit request size
according to implementation limits) helps.

Thanks,
Shaohua



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
https://lists.xen.org/xen-users

Follow-Ups:
- Re: [Xen-users] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
  - From: Shaohua Li

References:
- Re: [Xen-users] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
  - From: Shaohua Li

Prev by Date: [Xen-users] After Update: NUMA placement failed, performance might be affected
Next by Date: Re: [Xen-users] Problem compiling Xen-4.7 with qemu-traditional for IGD passthrough
Previous by thread: Re: [Xen-users] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
Next by thread: Re: [Xen-users] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.