[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] netback Oops then xenwatch stuck in D state
On Thu, 2013-02-14 at 15:29 +0000, Jan Beulich wrote: > >>> On 14.02.13 at 13:45, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: > >>>> On 14.02.13 at 13:33, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: > >>>>> On 14.02.13 at 13:13, Wei Liu <wei.liu2@xxxxxxxxxx> wrote: > >>> On Thu, 2013-02-14 at 11:48 +0000, Jan Beulich wrote: > >>>> I don't think this patch will fix his problems, which - as described > >>>> yesterday - I'm relatively certain result from the harsh action > >>>> netbk_fatal_tx_err() does. > >>> > >>> Yes, having this one only is not enough. It will need to either disable > >>> the vif timer or check vif->netbk != NULL inside timer callback. > >> > >> I continue to be unclear what timer you refer to. > >> > >> If we keep the carrier-off in fatal_tx_err, then _all_ > >> asynchronously callable interfaces need to check whether the > >> vif -> netbk association went away. But, especially in the > >> context of using tasklets instead of kthreads, I meanwhile > >> think that simply setting a flag (along with turning off the IRQ) > >> would be better (and keep the turning off of the carrier the way > >> it had been done before. The flag would basically need checking > >> anywhere we look at the shared ring (which ought to be very > >> few places), and perhaps should also cause xenvif_schedulable() > >> to return false. > > > > Or a "light" version of xenvif_down(), i.e. simply > > > > disable_irq(vif->irq); > > xen_netbk_deschedule_xenvif(vif); > > > > If I'm not mistaken, doing this instead of xenvif_carrier_off() > > could be all that's needed. > Just a side note, I probably would go along with this busted flag in the future. If we are to move to 1:1 model, there will be no vif <-> netbk connection anymore, so the testing against vif->netbk won't exist anymore. If this approach serves both tasklet and thread model well, this is the way to go. Wei. > Like the below/attached (compile tested only so far). > > Jan > > netback: fix shutting down the ring if it contains garbage > > Using rtnl_lock() in tasklet context is not permitted. > > This undoes the part of 1219:5108c6901b30 that split off > xenvif_carrier_off() from netif_disconnect(). > > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > > --- a/drivers/xen/netback/common.h > +++ b/drivers/xen/netback/common.h > @@ -78,6 +78,8 @@ typedef struct netif_st { > u8 can_queue:1; /* can queue packets for receiver? */ > u8 copying_receiver:1; /* copy packets to receiver? */ > > + u8 busted:1; > + > /* Allow netif_be_start_xmit() to peek ahead in the rx request ring. */ > RING_IDX rx_req_cons_peek; > > @@ -195,7 +197,8 @@ int netif_map(netif_t *netif, unsigned l > void netif_xenbus_init(void); > > #define netif_schedulable(netif) \ > - (netif_running((netif)->dev) && netback_carrier_ok(netif)) > + (likely(!(netif)->busted) && \ > + netif_running((netif)->dev) && netback_carrier_ok(netif)) > > void netif_schedule_work(netif_t *netif); > void netif_deschedule_work(netif_t *netif); > @@ -204,9 +207,6 @@ int netif_be_start_xmit(struct sk_buff * > struct net_device_stats *netif_be_get_stats(struct net_device *dev); > irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs); > > -/* Prevent the device from generating any further traffic. */ > -void xenvif_carrier_off(netif_t *netif); > - > static inline int netbk_can_queue(struct net_device *dev) > { > netif_t *netif = netdev_priv(dev); > --- a/drivers/xen/netback/interface.c > +++ b/drivers/xen/netback/interface.c > @@ -56,6 +56,7 @@ module_param_named(queue_length, netbk_q > > static void __netif_up(netif_t *netif) > { > + netif->busted = 0; > enable_irq(netif->irq); > netif_schedule_work(netif); > } > @@ -347,23 +348,19 @@ err_rx: > return err; > } > > -void xenvif_carrier_off(netif_t *netif) > -{ > - rtnl_lock(); > - netback_carrier_off(netif); > - netif_carrier_off(netif->dev); /* discard queued packets */ > - if (netif_running(netif->dev)) > - __netif_down(netif); > - rtnl_unlock(); > - netif_put(netif); > -} > - > void netif_disconnect(struct backend_info *be) > { > netif_t *netif = be->netif; > > - if (netback_carrier_ok(netif)) > - xenvif_carrier_off(netif); > + if (netback_carrier_ok(netif)) { > + rtnl_lock(); > + netback_carrier_off(netif); > + netif_carrier_off(netif->dev); /* discard queued packets */ > + if (netif_running(netif->dev)) > + __netif_down(netif); > + rtnl_unlock(); > + netif_put(netif); > + } > > atomic_dec(&netif->refcnt); > wait_event(netif->waiting_to_free, atomic_read(&netif->refcnt) == 0); > --- a/drivers/xen/netback/netback.c > +++ b/drivers/xen/netback/netback.c > @@ -845,7 +845,7 @@ void netif_schedule_work(netif_t *netif) > RING_FINAL_CHECK_FOR_REQUESTS(&netif->tx, more_to_do); > #endif > > - if (more_to_do) { > + if (more_to_do && likely(!netif->busted)) { > add_to_net_schedule_list_tail(netif); > maybe_schedule_tx_action(); > } > @@ -1024,7 +1024,9 @@ static void netbk_fatal_tx_err(netif_t * > { > printk(KERN_ERR "%s: fatal error; disabling device\n", > netif->dev->name); > - xenvif_carrier_off(netif); > + netif->busted = 1; > + disable_irq(netif->irq); > + netif_deschedule_work(netif); > netif_put(netif); > } > > @@ -1292,6 +1294,11 @@ static void net_tx_action(unsigned long > if (!netif) > continue; > > + if (unlikely(netif->busted)) { > + netif_put(netif); > + continue; > + } > + > if (netif->tx.sring->req_prod - netif->tx.req_cons > > NET_TX_RING_SIZE) { > printk(KERN_ERR "%s: Impossible number of requests. " > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |