[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] memory fault



" >      DOM2: __alloc_pages: 0-order allocation failed (gfp=0x20/0)

" It's pretty unlikely this is anything to do with Xen -- I bet you
" could reproduce this on a stock Linux compiled without CONFIG_HIGHMEM

You are correct.  This message pops up on stock linux as well if
memory is constrained as tight as in our Xen config.


" >  DOM3: Unable to handle kernel paging request at virtual address c3f77820

The EIP is in arch/xeno/drivers/network/network.c:_network_interrupt() 
I no longer have the oops messages unfortunately.   We had to get
the hosts going again for that project and the oops got lost.


" >      DOM1: Weird failure in hard_start_xmit

Xen prints this message here:
xeno-1.2.bk/xen/net/dev.c:816: printk("Weird failure in hard_start_xmit!\n");

Last night a user sent me a detailed report on NIC trouble:

"  When the machines freeze up running bbsend, bbrecv, or netgen, they _also_
"  freeze up on incoming SSH connections.
"  If I'm already logged into rack217 via SSH when I start a netgen, then my
"  interactive session gets laggy or freezes completely.
"  
"  At any time, killing the netgen process makes whatever was frozen resume
"  almost immediately.
"  
"  We're not talking about large amounts of traffic here: 12KB/s causes all
"  of the above symptops.  netgen and bbsend both do some busy-waiting, but
"  not that much of it.
"  
"  For some reason, the system load goes sky-high, even with just one netgen
"  process.  netgen is single-threaded and spends less than half of its time
"  busy-waiting, yet system load often ends up above 3.
"  
"  End of symtoms, beginning of theory: all the bad systems are P4s running
"  Xeno and using Broadcom ethernet cards.  (At least, they used to be
"  Broadcoms.  With Xeno running, I can no longer check.)  The working
"  systems are a mix of P4 and P3, Xeno is running on two of them (but only
"  on P3s), and they're all eepro100 cards.
"  
"  My guess is that Xeno is interacting badly with either the bcm5700 or the
"  P4.  I'm leaning toward the former.  Is there any way to boot the machines


That "hard_start_xmit" message showed up on the hosts with
Broadcom BCM5703 NICs.

We'll setup a test cluster to isolate what is going on with these
network apps.



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.