WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
   
 
 
Xen 
 
Home Products Support Community News
 
   
 

xen-devel

Re: [Xen-devel] xenolinux /dev/random

To: Steve Traugott <stevegt@xxxxxxxxxxxxx>
Subject: Re: [Xen-devel] xenolinux /dev/random
From: Steven Hand <Steven.Hand@xxxxxxxxxxxx>
Date: Thu, 13 May 2004 08:13:37 +0100
Cc: Ian Pratt <Ian.Pratt@xxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxxx, joyce@xxxxxxxxxxxxx, awclarke@xxxxxxxxxxx, Steven.Hand@xxxxxxxxxxxx
Delivery-date: Thu, 13 May 2004 08:13:38 +0100
Envelope-to: Steven.Hand@xxxxxxxxxxxx
In-reply-to: Message from Steve Traugott <stevegt@xxxxxxxxxxxxx> of "Wed, 12 May 2004 20:09:55 PDT." <20040513030955.GD3152@pathfinder>
>My goodness!  See the message I just now posted to xen-devel about NFS
>root hangs; could this be what we're hitting?  The most recent hang we
>saw happened while an rsync was running over ssh *and* someone restarted
>apache...
>
>This wouldn't cause the "NFS server not responding/NFS server OK"
>messages on the domain's console, though (or does that show up as a
>symptom of this too?)

I don't think this is the cause of the NFS hangs you've been seeing; that
appears to be a generic linux thing (at least we see it with our regular
linux boxes as well as with xen boxes); however if you want to test the 
theory the easiest thing to do is to change the /dev/random device node
to be an alias for /dev/urandom (a non-blocking but potentiallyweaker
source of randomness). 

The /dev/random bug only really manifested for us during boot, only on
Xen, and resulted in a permanenent hang.

The "NFS server foo not responding" followed by later "NFS server foo OK" 
messages from linux appear to be due to a combination of stupid timeouts 
in the linux sunrpc code and another bug which can cause automounters 
to fall into an uninterruptible sleep. If you check "ps auwwx" on a 
machine which is having problems and notice proceesses in state 'D' 
then this is biting you. Even if this doesn't occur, the crappy timeouts 
in the regular linux code mean that linux perfroms very badly if it gets 
any errors/loss/congestion during nfs operations.

cheers, 

S.