[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs
Am Montag, den 06.06.2005, 14:30 +0200 schrieb Birger Tödtmann: > Am Montag, den 06.06.2005, 10:26 +0100 schrieb Keir Fraser: > [...] > > > somewhere around the magic 128 (NR_IRQS problem in 2.0.x!) when the > > > crash happens - could this hint to something? > > > > The crashes you see with free_mfn removed will be impossible to debug > > -- things are very screwed by that point. Even the crash within > > free_mfn might be far removed from the cause of the crash, if it's due > > to memory corruption. > > > > It's perhaps worth investigating what critical limit you might be > > hitting, and what resource it is that's limited. e.g., can you can > > create a few vifs, but connected together by some very large number of > > bridges (daisy chained together)? Or can you create a large number of > > vifs if they are connected together by just one bridge? > > This is getting really weird - as I found out I'll enounter problems > with far fewer vifs/bridges that suspected. I just fired up a network > with 7 nodes, all with four interfaces each connected to the same four > bridge interfaces. The nodes can ping through the network, however > after a short time, the system (dom0) crashes as well. This time, it > dies in net_rx_action() at a slightly different place: > > [...] > [<c02b6e15>] kfree_skbmem+0x12/0x29 > [<c02b6ed1>] __kfree_skb+0xa5/0x13f > [<c028c9b3>] net_rx_action+0x23d/0x4df > [...] > > Funnily, I cannot reproduce this with 5 nodes (domUs) running. I'm a > bit unsure where to go from here... Maybe I should try a different > machine for further testing. I can confirm this bug on AMD Athlon using xen-unstable from june 5th (latest ChangeSet 1.1677). All testing domains run OSPF daemons which will start talking via multicast to each other as soon as the network connections are established. * 'xm create' 20 domains with 122 vifs (+ vif0.0), but that xen- version does not UP the vifs. Everything is fine. * Create 51 transfer bridges, connect the some vifs to them (not more than two vifs to each) UP all vifs. Now i have lo + eth0 + veth0 + 123 vif* + 51 br* = 177 devices, all UP. All transfer networks work, OSPF tables grow, everything is fine. * Create a 52th bridge. Connect 20 vifs to it but DOWN THEM BEFORE. Everything ist fine. * Now UP all the vifs connected to the 52th bridge one after the other. More and more multicast traffic shows up. After UPing the 9th vif, dom0 BOOOOOMs (net_rx_action, too). Further experiments show that its seems to be the amount of traffic (and the number of connected vifs?) which triggers the oops: with all OSPF daemons stopped, i could UP all bridges & vifs. But when i did a flood- broadcast ping (ping -f -b $broadcastadr) on the 52th bridge (that one with more that two active ports), dom0 OOPSed again. I could only reproduce that "too-much-traffic-oops" on bridges connecting more that 10 vifs. Would be interesting if that happens with unicast traffic, too. Have no time left, test more tomorrow. /nils. ps: Shall we continue crossporting to devel+users? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |