[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netback BUG_ON when using copy_skb=1



>>> On 26.10.13 at 10:32, jerry <jerry.lilijun@xxxxxxxxxx> wrote:
> The reason why the vif net-device isn't released after shutting down VM was 
> found with copy_skb disabled.
> Let it be supposed that VM1(vif1.0) sends packets to VM2(vif2.0) by virtual 
> switch.
> 1) The VM2's OS is windows 2003 and has been shutdown before for some 
> unexpected reason.
>     After being created, this VM2 stopped the starting process at the prompt 
> windows named "Shutdown Event Tracker".
>    It is waiting for users to input some messages for the question why the 
> computer shut down unexpectedly.
> 
> 2) The VM2 already has vif2.0 created. Then I added a new vif net-device 
> using virsh commands.
>   The new vif2.1 was not completely created with no interrupts, but its 
> state is running and tx queues is started as default.
>    The function connect() in xenbus.c hasn't been called for vif2.1. The 
> related information in xenstore is as follows:
> linux-szRoyS:/ # xenstore-ls -f | grep 2 | grep state
> /local/domain/0/device-model/2/state = "running"
> /local/domain/0/backend/vbd/2/51712/state = "4"
> /local/domain/0/backend/vbd/2/51760/state = "4"
> /local/domain/0/backend/vif/2/0/state = "4"
> /local/domain/0/backend/vif/2/1/state = "2"
> /local/domain/0/backend/console/2/0/state = "1"
> /local/domain/2/control/uvp/vm_state = "running"
> /local/domain/2/device/vbd/51712/state = "4"
> /local/domain/2/device/vbd/51760/state = "4"
> /local/domain/2/device/vif/0/state = "4"
> /local/domain/2/device/vif/1/state = "1"
> 
> 3)  The KOBJ_ONLINE message was generated in function backend_create_netif() 
> called in netback_probe().
>     This event will invoke network script named "vif-bridge" executing and 
> add vif2.1 to virtual switch.
>     Then packets from vif1.0(VM1) will be forwarded or flooded to vif2.1 by 
> virtual switch.
>     The vif2.1 dropped this packets because its not netif_schedulable() in 
> function netif_be_start_xmit().
> 
> 4)  After setting vif2.1 to down and then to up, the TX queue can't be 
> started in net_open() with carrier off.
>     So its qdisc became fifo_qdic and the TX queue state stopped.
>     In this case, the packets will be held in qdisc queue and can't be 
> dequeued in function dequeue_skb()
>     for vif2.1's stopped TX queues.
> 
> 5)  If VM1 was destroyed, the packets from vif1.0 can't be released and 
> vif1.0 can't be disconnected.
>     The vif1.0 will be remained unreleased until setting vif2.1 to down.
> 
>    This problem is mainly because that vif2.1 was not created successfully 
> and got in a strange state:
>    running but TX queue is stopped. The function backend_create_netif() is 
> called in two place netback_probe() and
>    frontend_changed(). I think we can remove the backend_create_netif() call 
> in netback_probe().
>    So we can make sure the vif net-device created completely after front-end 
> changed to XenbusStateConnected.
> 
>    The patch is as follows:
> --- drivers/xen/netback/xenbus.c.old    2013-10-26 16:23:07.000000000 +0800
> +++ drivers/xen/netback/xenbus.c        2013-10-26 16:23:31.000000000 +0800
> @@ -156,9 +156,6 @@
>         if (err)
>                 goto fail;
> 
> -       /* This kicks hotplug scripts, so do it immediately. */
> -       backend_create_netif(be);
> -
>         return 0;
> 
>  abort_transaction:
> 
>    Do you have some ideas?

No, not really. Would be helpful if this could be matched up to
behavior (and eventual changes thereto) of the upstream driver.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.