[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] io hang with lvm on md raid1





On Tue, Oct 18, 2016 at 8:23 AM, Sarah Newman <srn@xxxxxxxxx> wrote:
On 10/17/2016 11:10 PM, Tomas Mozes wrote:
> On Mon, Oct 17, 2016 at 11:11 PM, Glenn Enright <glenn@xxxxxxxxxxxxxxx <mailto:glenn@xxxxxxxxxxxxxxx>> wrote:
>
>     On 10/10/16 16:06, Sarah Newman wrote:
>
>         On 10/09/2016 02:23 PM, Glenn Enright wrote:
>
>             Bump? I've now replicated this on raid10 and raid6 as well, so its not caused by the raid level. An example of a blkback process that is
>             stuck is
>             below, if that offers any additional insight. In all cases I'm seeing dmeventd stuck first though
>
>
>         Maybe related? https://bugzilla.kernel.org/show_bug.cgi?id=119841 <https://bugzilla.kernel.org/show_bug.cgi?id=119841>
>
>         Xen4CentOS uses 3.18, not 4.4. You could try the Xen4CentOS kernel and see if you get the same errors. Unfortunately EOL for 3.18 is supposed
>         to be
>         January 2017.
>
>         --Sarah
>
>
>     Thanks for your followup Sarah, I have to admit I was not able to pin down the exact cause. We have since implemented a workaround for the issue.
>
>     As close as I can determine... for historical reasons related to sparse file support we were using cp to copy off an lvm snapshot. Which clearly
>     was not tolerant of io problems. We are now using dd with conv=sparse and since then have not seen any further recurrences of the lockup.

> We have a similar problem, but it's not related to LVM snapshots. Our domU running MariaDB hangs on a highly loaded server after some time (for
> example after a mysql restore / percona xtrabackup base backup). Sometimes we cannot even ssh to the server and it needs to be destroyed via xl.
>
> The domU runs in PV mode, all mount points are logical volumes taken from the dom0, kernel 4.4 and xen 4.6.3. It's happening randomly (on two servers).

What observations from the dom0 makes you think this is a related problem? My understanding is that Glenn's problems started from cp running in the
context of the dom0, and that the blkback processes and dmeventd hung.

You should probably set up a login on hvc0 for your domUs if you haven't already.

--Sarah

I suppose it either can be xen, xfs, lvm, kernel or hardware related. Since it happened on different hardware, we are looking elsewhere. This is what comes closest - lvm, kernel 4.4, xen 4.6 and hang and nothing in the dmesg. Maybe I'm wrong, I'm trying to simulate the problem and then issuing echo w > /proc/sysrq-trigger. Any other advice is appreciated.

We do have hvc0 for domUs, but it was impossible to login with them when it happened.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
https://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.