[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-community] A weird bug in Xen networking?



 I think I've hit a weird and mostly hidden bug in Xen, but I'm not 100%
sure...

Here's the setup - I have a OpenSuSE 11.2 based Dom0 (Xen 3.4.1). Dom0
is also acting as a router / firewall and it provides WAN connectivity
for DomU's by means of IPSEC (OpenSwan). I use 'bridged' networking for
DomU's, there are several NIC's as each DomU belongs to a separate
subnet. Dom0's bridge interfaces have an IP also belonging to respective
subnet and this IP is used as a gateway for the subnet.

DomU's are also OpenSuSE 11.2. I use 'cfengine' to centrally manage most
of the configuration and (custom) software distribution.

That's where things go south - when I run cfengine's 'cfagent', it runs
and it works up to a point where it just hangs. I can interrupt it with
'CTRL-C' or I can wait till it timeout's (socket timeout). Initially I
thought it's cfengine's problem, but then I noticed that a similar thing
happens when I connect to a DomU with SSH and run 'ls -lR /' - it goes
through some directories but eventually it just stalls (and I have to
disconnect the SSH session to 'get out').

Everytime such a 'hang' happens I see some OpenSwan / ipsec errors on Dom0:

   klips_error:ipsec_xmit_encap_once: tried to skb_put 20, 16
available.  This should never happen, please report.

The numbers vary somewhat (sometimes it's 21, 17 instead 20,16).

I posted all my 'findings' on OpenSwam mailing list thinking it might be
an OpenSwan issue, but one of the developers said it doesn't look like
'their' issue and that I should talk to 'Xen guys'. Here is the relevant
part of his reply:

>
> Yeah, this does not seem to be an openswan bug. The code in question is:
> (one instance of it):
>
>         /* Set the data pointer */
>         skb_reserve(n,skb->data-skb->head+headroom);
>         /* Set the tail pointer and length */
>         if(skb_tailroom(n) < skb->len) {
>                 printk(KERN_WARNING "klips_error:skb_copy_expand: "
>                        "tried to skb_put %ld, %d available.  This
> should never happen, please report.\n",
>                        (unsigned long int)skb->len,
>                        skb_tailroom(n));
>                 ipsec_kfree_skb(n);
>                 return NULL;
>         }
>
> I would check with the xen people to see what might be going on. 

So here I am, asking the 'Xen guys'.

Does anyone have any idea what might be going on?

 
 Regards, Danilo



_______________________________________________
Xen-community mailing list
Xen-community@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-community


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.