[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] xen dom0 nfs hangs, oom killer and more

Managed to dig up some background thread for my case:

On Tue, Feb 21, 2017 at 11:40 AM, G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx> wrote:
> Sorry that I didn't read your mail carefully, so forgive me if what I
> mentioned does not make sense to you.
> I may have hit similar issue before.
> While I could not remember all the details, the impression is that in
> my case this has something to do with kernel compilation config.
> You should be avoid aggressive kernel preemption config (e.g realtime
> desktop / desktop). Using the 'server' config works fine for me.
> With preemption enabled, the deadlock situation happens when system
> memory is low and NFS or netback asks for more memory while VMM tries
> to relinquish some page from NFS or netback. This by no means will be
> exact / precise description since it's all from my unreliable memory.
> But anyway, its something you could have a check in this direction.
> On Mon, Feb 20, 2017 at 8:27 PM, Dario Faggioli
> <dario.faggioli@xxxxxxxxxx> wrote:
>> On Sun, 2017-02-12 at 23:39 -0800, Mike wrote:
>>> Hello,
>> Hi,
>>>      I have had long term issues with xen dom0 and NFS / CIFS which
>>> essentially make it impossible to do any reasonable amount of
>>> network
>>> based filesystem operations from dom0 without the risk of excessive
>>> blocking or hanging of dom0 processes with messages such as "blocked
>>> for
>>> more than 120 seconds". I last posted about this problem in 2015 and
>>> documented some very extensive troubleshooting on this problem
>>> (https://lists.gt.net/xen/users/381469). I have never found a
>>> resolution, across generations of hardware, OS installs, networks
>>> and
>>> more, the penalty simply seems that if you attempt to use NFS / CIFS
>>> from xen dom0, you can expect io performance issues akin to dialup
>>> speeds, and hanging / blocked processes starved for I/O that can only
>>> be
>>> resolved with a reboot of the box.
>> Ok. I do have a testbox that mounts stuff from NFS in dom0. It works
>> for me, but I use it (I mean, the NFS part) in a very limited way, I
>> have to admit.
>>>      So here it is in 2017 and I have taken yet another crack at all
>>> this. I now have yet again, new network, new servers, new os
>>> installs,
>>> new switches, the works. Not only can you still not use NFS/CIFS in
>>> dom0
>>> without the hangs, new symptoms arise now with OOM killer starting
>>> up
>>> and killing seemingly random processes in dom0 all the while with
>>> many
>>> many gigabytes of free ram, no obvious explanation.
>> Right. I understand this can be ver frustrating. :-(
>>> I am not even sure
>>> really how to best document this problem but it's like one of those
>>> emperor has no clothes things, nfs/cifs is unsafe and brings down
>>> xen
>>> dom0 no special effort required other than simply trying to use
>>> these,
>>> end of story.
>> Well, sure. But at the same time, without information it's hard
>> (impossible) to figure out what's happening and get to the bottom of
>> the issue.
>> So, as a starting point, full output of `xl dmesg' and `dmesg' if the
>> first thing I'd ask to see. Even more useful they will be actual info
>> of a crash. So, for instance, what OOMk says, whether Xen prints
>> anything on the serial console before locking/dying, etc.
>>>      I plan on setting up a test soon to smoke this out further.
>>> Going
>>> to have a host set up where I can boot xen or non-xen and run the
>>> same
>>> operations and see if I can show definitely this shows up only under
>>> xen
>>> dom0, and then maybe get a clearer picture of why.
>> Actually, yes. If you can trigger and reproduce the bug, and provide as
>> much logs as possible of the exploded system, that would hopefully be a
>> useful starting point for a diagnosis. :-)
>> Regards,
>> Dario
>> --
>> <<This happens because I choose it to happen!>> (Raistlin Majere)
>> -----------------------------------------------------------------
>> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
>> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@xxxxxxxxxxxxx
>> https://lists.xen.org/xen-users

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.