[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] PDFLUSH deadlock



Hi,

I did post this before, but as time passes I'm becoming more confident of my observations.

When using XEN in the way I'm using it, there is a deadlock condition somewhere in PDFLUSH that can lock your machine up.
The lockup may be very short / un-noticeable, or it may extent to a number of minutes.

Conditions
========

Lots of DomU's (say 10) all running off file-backed storage.
Dom0 with ~ 1/8 th of the system memory allocated to it. (say 800k on a 6G machine)
Big machine, 4 CPU's, 6G RAM.

Example cause of lockup
==================

In a DomU, do;
dd if=/dev/zero of=/tmp/big bs=1M count=1000

If you watch /proc/meminfo, "Dirty" will grow VERY rapidly and hit 500,000+ within seconds.
Wait for 10 seconds, then try to use "vi" in the Dom0, it will freeze.

"ps ax" in the Dom0 will reveal a number of processes have gone "D" state and at this point things like "df" will lock your session.

Cure
====

Type "sync" in the Dom0 - instant release.

Things I've tried
===========

Tweaking /proc/sys/vm/dirty_*

Whereas these can make a difference, there is stilla fundamental problem. PDFLUSH reaches a point where it "should" sync, and does not, which then causes itself to "pause", which leads to a deadlock as the system runs out of free pages.

Short terms fix
===========

I now have a "live" server running 10 DomU's in a hostile / live Internet environment.
Uptime - 3 days.

It's running very well and very smoothly, however this is because the Dom0 is running the following script;

root@nodea:~# cat syncme.sh
#!/bin/bash
while true; do sync ; sleep 5 ; done

If I kill this script, the server is guaranteed to lock up depending on load .. typically I would expect 10-15 mins.

QUESTION::
=========

I don't know why this only effects XEN Dom0's, however I notice that PDFLUSH's algorithms do seem to be dependent on the amount of memory available in the systems. Does anyone know if the balloon memory driver makes adjustments here, or after starting a DomU does the Dom0's PDFLUSH still think it has access to 100% of the system ram ????

tia
Gareth.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.