[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support
Hi Steven, Thanks for your careful review. Some explaination inline. For your consideration of group and interrupt affinity, I will reply in another thread. Thanks, Dongxiao Steven Smith wrote: >> I'd like to make an update on these patches. The main logic is not >> changed, and I only did a rebase towards the upstream pv-ops kernel. >> See attached patch. The original patches are checked in Jeremy's >> netback-tasklet branch. > I have a couple of (quite minor) comments on the patches: > > 0001-Netback-Generilize-static-global-variables-into-stru.txt: >> diff --git a/drivers/xen/netback/netback.c >> b/drivers/xen/netback/netback.c index c24debf..a484b0a 100644 --- >> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c >> @@ -49,18 +49,13 @@ >> >> /*define NETBE_DEBUG_INTERRUPT*/ >> >> +struct xen_netbk *xen_netbk; > >> +int group_nr = 1; >> +struct page_foreign_tracker *foreign_page_tracker; > I think these two would benefit from more descriptive names, given > that they're not static. Oops...I thought I had modified this when Jeremy commented this last time, maybe there was some mistake and I left it until today... I swear I will change this in next version of patchset. ;-) > > > > If I was feeling pedantic, I'd complain that this includes some bits > of support for multiple struct xen_netbks, rather than just moving all > of the fields around, which reduces its obviously-correct-ness quite a > bit. Actually I was struggling how to split the first patch into small ones, however I don't have much idea since the patch changes the function Interface/data structure, so the corresponding caller needs change too, which generates a 1k line of patch... > > Even more pedantically, it might be better to pass around a struct > xen_netbk in a few places, rather than an int group, so that you get > better compiler type checking. I will change this in next version of patch. > > > > 0002-Netback-Multiple-tasklets-support.txt: > Design question: you do your balancing by making each tasklet serve > roughly equal numbers of remote domains. Would it not have been > easier to make them serve equal numbers of netfronts? You could then > get rid of the domain_entry business completely and just have a count > of serviced netfronts in each xen_netbk, which might be a bit easier > to deal with. According to my understanding, one guest with VNIF driver represented by one netfront. Is this true? Therefore I don't understand the difference between "number of domains" and "number of netfronts", I used to thought they were the same. Please correct me my understanding is wrong. Actually in the very early stage of this patch, I use a simple method to identify which group does a netfront belong to, by calculating (domid % online_cpu_nr()). The advantage is simple, however it may cause netfront count unbalanced between different groups. I will try to remove "domain_entry" related code in next version patch. > >> diff --git a/drivers/xen/netback/interface.c >> b/drivers/xen/netback/interface.c index 21c1f95..bdf3c1d 100644 --- >> a/drivers/xen/netback/interface.c +++ >> b/drivers/xen/netback/interface.c @@ -54,6 +54,59 @@ >> static unsigned long netbk_queue_length = 32; >> module_param_named(queue_length, netbk_queue_length, ulong, 0644); >> >> >> +static void remove_domain_from_list(struct xen_netbk *netbk, >> + struct xen_netif *netif) >> +{ >> + struct domain_entry *dom_entry = NULL; >> + int group = netif->group; >> + >> + list_for_each_entry(dom_entry, >> + &netbk[group].group_domain_list, dom) { >> + if (dom_entry->domid == netif->domid) >> + break; >> + } >> + if (!dom_entry) >> + return; > Can this ever happen? If so, you might have issues when several > netfronts all end in the same frontend domain e.g.: > > -- netfront A arrives and is added to list > -- netfront B arrives and is added to list > -- netfront B is removed > -- netfront B is removed again. It's no longer in the list, > but A is, so A gets removed instead. > > The end result being that netback thinks the group is idle, but it > actually has netfront A in it. I guess the worst that can happen is > that you don't balance across tasklets properly, but it'd still be > better avoided. > > If it *can't* happen, there should probably be some kind of warning > when it does. > >> + >> + spin_lock(&netbk[netif->group].group_domain_list_lock); >> + netbk[netif->group].group_domain_nr--; >> + list_del(&dom_entry->dom); >> + spin_unlock(&netbk[netif->group].group_domain_list_lock); >> + kfree(dom_entry); +} >> + >> static void __netif_up(struct xen_netif *netif) >> { >> enable_irq(netif->irq); >> >> @@ -70,6 +123,7 @@ static int net_open(struct net_device *dev) { >> struct xen_netif *netif = netdev_priv(dev); >> if (netback_carrier_ok(netif)) { >> + add_domain_to_list(xen_netbk, group_nr, netif); >> __netif_up(netif); >> netif_start_queue(dev); >> } >> @@ -79,8 +133,10 @@ static int net_open(struct net_device *dev) >> static int net_close(struct net_device *dev) >> { >> struct xen_netif *netif = netdev_priv(dev); >> - if (netback_carrier_ok(netif)) >> + if (netback_carrier_ok(netif)) { >> __netif_down(netif); >> + remove_domain_from_list(xen_netbk, netif); >> + } >> netif_stop_queue(dev); >> return 0; >> } > Okay, so if the interface gets down'd and then up'd it'll potentially > move to a different group. How much testing has that situation had? > > I'd be tempted to add the interface to the list as soon as it's > created and then leave it there until it's removed, and not rebalance > during its lifetime at all, and hence avoid the issue. > >> @@ -1570,6 +1570,7 @@ static int __init netback_init(void) >> /* We can increase reservation by this much in net_rx_action(). */ >> // balloon_update_driver_allowance(NET_RX_RING_SIZE); >> >> + group_nr = num_online_cpus(); >> xen_netbk = kzalloc(group_nr * sizeof(struct xen_netbk), >> GFP_KERNEL); if (!xen_netbk) { printk(KERN_ALERT "%s: out of >> memory\n", __func__); > What happens if the number of online CPUs changes while netback is > running? In particular, do we stop trying to send a tasklet/thread to > a CPU which has been offlined? The group_nr just defines the max number of tasklets, however it doesn't decide which tasklet is handled by which CPU. It is decided by the delivery of interrupt. > > > 0003-Use-Kernel-thread-to-replace-the-tasklet.txt: >> diff --git a/drivers/xen/netback/netback.c >> b/drivers/xen/netback/netback.c index 773cd4f..8b55efc 100644 --- >> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c >> +static int netbk_action_thread(void *index) >> +{ >> + int group = (long)index; >> + struct xen_netbk *netbk = &xen_netbk[group]; >> + while (1) { >> + wait_event_interruptible(netbk->netbk_action_wq, >> + rx_work_todo(group) + >> || tx_work_todo(group)); >> + cond_resched(); >> + >> + if (rx_work_todo(group)) >> + net_rx_action(group); >> + >> + if (tx_work_todo(group)) >> + net_tx_action(group); >> + } > Hmm... You use kthread_stop() on this thread, so you should probably > test kthread_should_stop() every so often. OK, I will modify it. > >> + >> + return 0; >> +} >> + >> + >> static int __init netback_init(void) >> { >> int i; > > > Apart from that, it all looks fine to me. > > Steven. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |