[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] [XCP-1.1] High OVS cpu load and unresponsive host network while VMPR archive phase is running


  • To: xen-api@xxxxxxxxxxxxxxxxxxx
  • From: Ben Pfaff <blp@xxxxxxxxxxxxxxx>
  • Date: Wed, 01 Aug 2012 10:27:11 -0700
  • Cancel-lock: sha1:R8u5NZayPzLCDPXkE8gQrHX3oy4=
  • Delivery-date: Wed, 01 Aug 2012 17:27:59 +0000
  • List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>

Christian Fischer
<christian.fischer@xxxxxxxxxxxxxxxxxxx>
writes:

> On Wednesday 01 August 2012 02:08:00 Ben Pfaff wrote:
>> Christian Fischer <christian.fischer@xxxxxxxxxxxxxxxxxxx> writes:
>> > On Tuesday 31 July 2012 18:08:18 Ben Pfaff wrote:
>> >> Christian Fischer
>> >> 
>> >> writes:
>> >> > We have no tagged vlans here, all physical switch ports running access
>> >> > mode. I wouldn't say that network load is increased when this happens,
>> >> > 15 kpps. Network performance could be poor due either a vswitch issue
>> >> > (runs at 180% CPU load if the vswitch log don't lie) or high load
>> >> > on/cheep hardware of the customer shared backup storage. I've never
>> >> > seen this stuff.
>> >> 
>> >> 180% CPU load is impossible for OVS 1.0.1, which has only a
>> >> single procsss with a single thread.
>> > 
>> > Yes, that's right, but we run OVS 1.4.2
>> > 
>> > XCP build: 1.1.0-50674c
>> > OVS build: 1.4.2
>> > NICs: BCM5709 Gigabit TOE iSCSI Offload
>> > OVS NIC bonding: active/active
>> 
>> Only the as-yet-unreleased post-1.8.0 Open vSwitch has more than
>> one process, and it still doesn't have multiple threads.
>> 
>> I suppose ovsdb-server and ovs-vswitchd could both go crazy at
>> the same time, but I haven't had any reports of that.
>> 
>> What process(es) add up to 180%?
>
>
> Both, server and vswitchd logs, show a lot of poll_loop entries with high CPU 
> usage. You can find some snippets at pastbin. Send a mail if you need the 
> whole 
> logs.
>
> ovsdb-server.log Jul 26 08:00:
> http://pastebin.com/RaCRyZiz
> ovs-vswitchd.log Jul 26 08:00:
> http://pastebin.com/bmXJUWaT

The ovsdb-server high CPU usage appears to be due to tons of
activity talking to ovs-vswitchd.  That is very strange; it
doesn't really make sense.  Is there anything particularly
unusual going on, such as something modifying the database
quickly, VMs going up and down at a high rate, etc.?

The ovs-vswitchd high CPU usage appears to be due to a lot of
activity from the OpenFlow controller (I guess that's the VSwitch
Controller you mention).

The bonding code is unnecessarily shifting around load, but I
don't think that would cause a lot of CPU usage.

> ovs-vswitchd.log Jul 30 22:30 (180 - 230 % CPU load):
> http://pastebin.com/xZykK2Ad

That one doesn't make any sense to me.

What do you see for these processes' CPU usage using some other
tool, such as "top"?

> Sometimes there was a VSwitch Controller (Citrix) connected,
> but it's removed,

In the first ovs-vswitchd.log paste, the controller certainly
looks like a culprit.


_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.