[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xenolinux /dev/random



>My goodness!  See the message I just now posted to xen-devel about NFS
>root hangs; could this be what we're hitting?  The most recent hang we
>saw happened while an rsync was running over ssh *and* someone restarted
>apache...
>
>This wouldn't cause the "NFS server not responding/NFS server OK"
>messages on the domain's console, though (or does that show up as a
>symptom of this too?)

I don't think this is the cause of the NFS hangs you've been seeing; that
appears to be a generic linux thing (at least we see it with our regular
linux boxes as well as with xen boxes); however if you want to test the 
theory the easiest thing to do is to change the /dev/random device node
to be an alias for /dev/urandom (a non-blocking but potentiallyweaker
source of randomness). 

The /dev/random bug only really manifested for us during boot, only on
Xen, and resulted in a permanenent hang.

The "NFS server foo not responding" followed by later "NFS server foo OK" 
messages from linux appear to be due to a combination of stupid timeouts 
in the linux sunrpc code and another bug which can cause automounters 
to fall into an uninterruptible sleep. If you check "ps auwwx" on a 
machine which is having problems and notice proceesses in state 'D' 
then this is biting you. Even if this doesn't occur, the crappy timeouts 
in the regular linux code mean that linux perfroms very badly if it gets 
any errors/loss/congestion during nfs operations.

cheers, 

S.


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.