[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
Hi Shaohua, I've made some new little tests, maybe it can help.- I tried creating the RAID 5 stack with only 2 drives (mdadm --create /dev/md10 --raid-devices=3 --level=5 /dev/sdc1 /dev/sdd1 missing). The same issue is happening.- but one time (still with 2/3 drives), I was not able to crash the kernel, with exactly the same procedure as previous. Even with re-creating filesystems ect. In order to re-produce the BUG I had to re-create the array. Can this be linked to this message ? : [ 155.667456] md10: Warning: Device sdc1 is misalignedI don't know how to "align" a drive in a RAID stack... The partition is correctly align (as "parted" says). - In another test (still 2/3 drives in the stack), I didn't got the kernel crash, but I had 100% io wait on cpu. Trying to reboot, finally give me this printk messages : http://pastebin.com/uzVHUUrC If you have any patch to give me (maybe something to be more verbose about the issue), please tell me, I'll test it as it's a really blocking issue... Best regards, MasterPrenium Le 09/01/2017 à 23:44, Shaohua Li a écrit : On Sun, Jan 08, 2017 at 02:31:15PM +0100, MasterPrenium wrote:Hello, Replies below + : - I don't know if this can help but after the crash, when the system reboots, the Raid 5 stack is re-synchronizing [ 37.028239] md10: Warning: Device sdc1 is misaligned [ 37.028541] created bitmap (15 pages) for device md10 [ 37.030433] md10: bitmap initialized from disk: read 1 pages, set 59 of 29807 bits - Sometimes the kernel completely crash (lost serial + network connection), sometimes only got the "BUG" dump, but still have network access (but a reboot is impossible, need to reset the system). - You can find blktrace here (while running fio), I hope it's complete since the end of the file is when the kernel crashed : https://goo.gl/X9jZ50Looks most are normal full stripe writes.I'm trying to reproduce, but no success. So ext4->btrfs->raid5, crash btrfs->raid5, no crash right? does subvolume matter? When you create the raid5 array, does adding '--assume-clean' option change the behavior? I'd like to narrow down the issue. If you can capture the blktrace to the raid5 array, it would be great to hint us what kind of IO it is.Yes Correct. The subvolume doesn't matter. -- assume-clean doesn't change the behaviour.so it's not a resync issue.Don't forget that the system needs to be running on xen to crash, without (on native kernel) it doesn't crash (or at least, I was not able to make it crash).Regarding your patch, I can't find it. Is it the one sent by Konstantin Khlebnikov ?Right.It doesn't help :(. Maybe the crash is happening a little bit later.ok, the patch is unlikely helpful, since the IO size isn't very big. Don't have good idea yet. My best guess so far is virtual machine introduces extra delay, which might trigger some race conditions which aren't seen in native. I'll check if I could find something locally. Thanks, Shaohua _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |