[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] domU network has sleeping sickness



Steven Timm wrote:
> I've seen the same problem with my xen 3.1.0 setup.  What
> the Xen gurus are telling us is that this is a symptom of Xen dom0
> being busy and not servicing the network interrupts of the domu's
> promptly.  Their advice to us was to shift an application that
> had been running on dom0 to another Xen instance to see if that
> would help.  We are in the process of implementing that solution now.
>

There is nothing running on my dom0's. They're only purpose is managing
the domU's.
On one of the problematic XEN-hosts is actually load on the three
domU's, they are serving continous build systems. But another sleepy
XEN-host with five domU's is more or less in pre-production state and
idling.

> By the way my system (Dell poweredge2950) has got broadcomm
> inbuilt network cards, not Intel E1000 so it is unlikely that
> it is a network driver specific issue.
>
> During these episodes of non-network connectivity, by the way,
> it was not unusual to see the following kernel dump in dom0
>

I do'nt find anything helpful or suspicious in any log. But maybe I'm
missing it.
I'm looking in dom0 in dmesg, messages, warn, xend-debug.log,  xend.log
and xen-hotplug.log and in the domU in dmesg, messages and warn.
But after the bootup process there is more or less nothing important logged.

> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: Call Trace:
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: <IRQ>
> [<ffffffff8025
> 8269>] softlockup_tick+0xcc/0xde
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff8020e84d>]
>  timer_interrupt+0x3a3/0x401
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff80258898>]
>  handle_IRQ_event+0x4b/0x93
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff8025897e>]
>  __do_IRQ+0x9e/0x100
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff8020cc97>]
>  do_IRQ+0x63/0x71
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff8034b347>]
>  evtchn_do_upcall+0xee/0x165
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:
> [<ffffffff8020abca>]
>  do_hypervisor_callback+0x1e/0x2c
> 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: <EOI>
>
> or
>
> Feb 25 10:32:39 fermigrid6 kernel: BUG: soft lockup detected on CPU#0!
> Feb 25 10:32:39 fermigrid6 kernel:
> Feb 25 10:32:39 fermigrid6 kernel: Call Trace:
> Feb 25 10:32:39 fermigrid6 kernel:  <IRQ> [<ffffffff80258269>]
> softlockup_tick+0xcc/0xde
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020e84d>]
> timer_interrupt+0x3a3/0x401
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80258898>]
> handle_IRQ_event+0x4b/0x93
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8025897e>]
> __do_IRQ+0x9e/0x100
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020cc97>] do_IRQ+0x63/0x71
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8034b347>]
> evtchn_do_upcall+0xee/0x165
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020abca>]
> do_hypervisor_callback+0x1e/0x2c
> Feb 25 10:32:39 fermigrid6 kernel:  <EOI> [<ffffffff8020622a>]
> hypercall_page+0x22a/0x1000
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020622a>]
> hypercall_page+0x22a/0x1000
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8034b258>]
> force_evtchn_callback+0xa/0xb
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff803f2272>]
> thread_return+0xdf/0x119
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020622a>]
> hypercall_page+0x22a/0x1000
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80228a25>]
> __cond_resched+0x1c/0x44
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff803f25df>]
> cond_resched+0x37/0x42
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff802343c4>]
> ksoftirqd+0x0/0xbf
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff80234432>]
> ksoftirqd+0x6e/0xbf
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff802422d7>]
> kthread+0xc8/0xf1
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020ae1c>]
> child_rip+0xa/0x12
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8024220f>] kthread+0x0/0xf1
> Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020ae12>]
> child_rip+0x0/0x12
>
> ----------------
>
> One of our dom0's was running an LVS server, the other one on
> identical hardware was not.  We moved the LVS server from one to the
> other and
> the network problems and kernel panics followed it.
>
> Steve Timm
>
> On Mon, 3 Mar 2008, Marc Teichgraeber wrote:
>
>> Hi all,
>>
>> I have a strange network problem with some domU's on three XEN-Hosts.
>> They are loosing their network connectivity. I do bridged networking.
>>   * It happens randomly and could happen right after bootup of the domU
>> or anytime later.
>>   * The domU is not reachable from another host on the LAN.
>>   * The domU is always reachable from the dom0 (ssh, ping).
>>   * I can 'repair' the connection when attaching to the console and
>> ping out from the domU. First nothings happens, then the machine gets
>> back their network. (And thats also my momentary workaround, pinging all
>> the time from the console)
>>   * Pinging from another host at the same time helps too.
>>   * It could be that I can ping continously from one host and another
>> hosts gets only every 10th packet or so back.
>>   * The interfaces could come back from their sleep by itself.
>>   * When the networks has fallen asleep, ssh on the domU from another
>> host hangs, it does not come back with "no route to host" or something.
>>
>> I'm suspicious about the network controllers, they are the same on all
>> hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller
>> (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection
>> with I/O Acceleration"(Intel website). I've tried the latest e1000
>> driver from Intel but it does'nt helped.
>> I've checked all MAC Adresses, they are unique, also the IP Adresses.
>>
>> Any ideas are welcome :)
>>
>> -------------------------------------------------------------------------
>>
>> "xm info" from host1,  openSUSE 10.2 (X86-64):
>>
>> release                : 2.6.18.8-0.9-xen
>> version                : #1 SMP Sun Feb 10 22:48:05 UTC 2008
>> machine                : x86_64
>> nr_cpus                : 4
>> nr_nodes               : 1
>> sockets_per_node       : 2
>> cores_per_socket       : 2
>> threads_per_core       : 1
>> cpu_mhz                : 2327
>> hw_caps                :
>> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
>> total_memory           : 32766
>> free_memory            : 21607
>> max_free_memory        : 21607
>> max_para_memory        : 21603
>> max_hvm_memory         : 21544
>> xen_major              : 3
>> xen_minor              : 0
>> xen_extra              : .3_11774-23
>> xen_caps               : xen-3.0-x86_64
>> xen_pagesize           : 4096
>> platform_params        : virt_start=0xffff800000000000
>> xen_changeset          : 11774
>> cc_compiler            : gcc version 4.1.2 20061115 (prerelease) (SUSE
>> Linux)
>> cc_compile_by          : abuild
>> cc_compile_domain      : suse.de
>> cc_compile_date        : Thu Jan 10 21:22:54 UTC 2008
>> xend_config_format     : 2
>> -------------------------------------------------------------------------
>>
>> "xm info" output on host2, openSUSE 10.3 (X86-64)
>>
>> release                : 2.6.22.13-0.3-xen
>> version                : #1 SMP 2007/11/19 15:02:58 UTC
>> machine                : x86_64
>> nr_cpus                : 8
>> nr_nodes               : 1
>> sockets_per_node       : 2
>> cores_per_socket       : 4
>> threads_per_core       : 1
>> cpu_mhz                : 3000
>> hw_caps                :
>> bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
>> total_memory           : 16382
>> free_memory            : 591
>> max_free_memory        : 591
>> max_para_memory        : 587
>> max_hvm_memory         : 577
>> xen_major              : 3
>> xen_minor              : 1
>> xen_extra              : .0_15042-51
>> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
>> xen_scheduler          : credit
>> xen_pagesize           : 4096
>> platform_params        : virt_start=0xffff800000000000
>> xen_changeset          : 15042
>> cc_compiler            : gcc version 4.2.1 (SUSE Linux)
>> cc_compile_by          : abuild
>> cc_compile_domain      : suse.de
>> cc_compile_date        : Tue Sep 25 21:16:06 UTC 2007
>> xend_config_format     : 4
>>
>>
>


-- 
--------------------------------
Marc Teichgraeber
Systemadministrator
Systemadministration

neofonie GmbH
Robert-Koch-Platz 4
10115 Berlin
fon: +49.30 24627 185
fax: +49.30 24627 120
marc.teichgraeber@xxxxxxxxxxx
http://www.neofonie.de 

Handelsregister
Berlin-Charlottenburg: HRB 67460

Geschaeftsfuehrung
Helmut Hoffer von Ankershoffen
Nurhan Yildirim
--------------------------------


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.