[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel][Pv-ops][PATCH 0/3] Resend: Netback multiple thread support



Steven Smith wrote:
>>>> I'd like to make an update on these patches. The main logic is not
>>>> changed, and I only did a rebase towards the upstream pv-ops
>>>> kernel. See attached patch. The original patches are checked in
>>>> Jeremy's netback-tasklet branch.
>>> I have a couple of (quite minor) comments on the patches:
>>> 
>>> 0001-Netback-Generilize-static-global-variables-into-stru.txt:
>>>> diff --git a/drivers/xen/netback/netback.c
>>>> b/drivers/xen/netback/netback.c index c24debf..a484b0a 100644 ---
>>>> a/drivers/xen/netback/netback.c +++
>>>> b/drivers/xen/netback/netback.c @@ -49,18 +49,13 @@ 
>>>> 
>>>>  /*define NETBE_DEBUG_INTERRUPT*/
>>>> 
>>>> +struct xen_netbk *xen_netbk;
>>> 
>>>> +int group_nr = 1;
>>>> +struct page_foreign_tracker *foreign_page_tracker;
>>> I think these two would benefit from more descriptive names, given
>>> that they're not static.
>> Oops...I thought I had modified this when Jeremy commented this
>> last time, maybe there was some mistake and I left it until today...
>> Easily done. 
> 
>>> If I was feeling pedantic, I'd complain that this includes some bits
>>> of support for multiple struct xen_netbks, rather than just moving
>>> all of the fields around, which reduces its obviously-correct-ness
>>> quite a bit.
>> Actually I was struggling how to split the first patch into small
>> ones, 
>> however I don't have much idea since the patch changes the function
>> Interface/data structure, so the corresponding caller needs change
>> too, 
>> which generates a 1k line of patch...
> The approach I would take would be something like this:
> 
> 1) Gather all the global data together into a struct xen_netbk, and
>    then have a single, global, instance of that structure.  Go through
>    and turn every reference to a bit of global data into a reference
>    to a field of that structure.  This will be a massive patch, but
>    it's purely mechanical and it's very easy to check that it's safe.
> 
> 2) Introduce struct ext_page and use it everywhere you use it in the
>    current patch.  This should be fairly small.
> 
> 3) Generalise to multiple struct xen_netbks by changing the single
>    global instance into a struct xen_netbk * and fixing the resulting
>    compile errors.  Another big patch, but provided you remember to
>    initialise everything properly the compiler will check almost all
>    of it for you.
> 
> This is to some extent a bikeshed argument, so if you prefer the
> current patch structure then that would work as well.

Thanks for your suggestion, I will have a try on this.

> 
>>> Even more pedantically, it might be better to pass around a struct
>>> xen_netbk in a few places, rather than an int group, so that you get
>>> better compiler type checking.
>> I will change this in next version of patch. Thanks.
> 
>>> 0002-Netback-Multiple-tasklets-support.txt:
>>> Design question: you do your balancing by making each tasklet serve
>>> roughly equal numbers of remote domains.  Would it not have been
>>> easier to make them serve equal numbers of netfronts?  You could
>>> then get rid of the domain_entry business completely and just have
>>> a count of serviced netfronts in each xen_netbk, which might be a
>>> bit easier to deal with.
>> According to my understanding, one guest with VNIF driver represented
>> by one netfront. Is this true? Therefore I don't understand the
>> difference between "number of domains" and "number of netfronts", I
>> used to thought 
>> they were the same. Please correct me my understanding is wrong.
> I think we might be using slightly different terminology here.  When I
> say ``netfront'', I mean the frontend half of a virtual network
> interface, rather than the netfront driver, so a single domain can be
> configured with multiple netfronts in the same way that a single
> physical host can have multiple ixgbes (say), despite only having one
> ixgbe driver loaded.
> 
> So, my original point was that it might be better to balance
> interfaces such that the number of interfaces in each group is the
> same, ignoring the frontend domain ID completely.  This would mean
> that if, for instance, a domain had two very busy NICs then they
> wouldn't be forced to share a tasklet, which might otherwise be a
> bottleneck.
> 
> The downside, of course, is that it would allow domains with multiple
> vNICs to use more dom0 CPU time, potentially aggravating starvation
> and unfairness problems.  On the other hand, a domain with N vNICs
> wouldn't be able to do any more damage than N domains with 1 vNIC
> each, so I don't think it's too bad.

It's my mis-understanding previously, thanks for explaination on this point. 

Regards,
Dongxiao

> 
>> Actually in the very early stage of this patch, I use a simple
>> method to 
>> identify which group does a netfront belong to, by calculating
>> (domid % online_cpu_nr()). The advantage is simple, however it may
>> cause netfront count unbalanced between different groups.
> Well, any static scheme will potentially come unbalanced some of the
> time, if different interfaces experience different levels of traffic.
> But see the other thread for another discussion of balancing issues.
> 
>> I will try to remove "domain_entry" related code in next version
>> patch. Thanks. 
> 
>>>> @@ -1570,6 +1570,7 @@ static int __init netback_init(void)
>>>>    /* We can increase reservation by this much in net_rx_action().
>>>>  */ //     balloon_update_driver_allowance(NET_RX_RING_SIZE);
>>>> 
>>>> +  group_nr = num_online_cpus();
>>>>    xen_netbk = kzalloc(group_nr * sizeof(struct xen_netbk),
>>>>            GFP_KERNEL);    if (!xen_netbk) { printk(KERN_ALERT "%s: out of
>>>> memory\n", __func__);
>>> What happens if the number of online CPUs changes while netback is
>>> running?  In particular, do we stop trying to send a tasklet/thread
>>> to a CPU which has been offlined?
>> The group_nr just defines the max number of tasklets, however it
>> doesn't decide which tasklet is handled by which CPU. It is decided
>> by the delivery of interrupt. 
> Ah, yes, so it is.  Thanks for explaining it.
> 
>>> 0003-Use-Kernel-thread-to-replace-the-tasklet.txt:
>>>> diff --git a/drivers/xen/netback/netback.c
>>>> b/drivers/xen/netback/netback.c index 773cd4f..8b55efc 100644 ---
>>>> a/drivers/xen/netback/netback.c +++ b/drivers/xen/netback/netback.c
>>>> +static int netbk_action_thread(void *index)
>>>> +{
>>>> +  int group = (long)index;
>>>> +  struct xen_netbk *netbk = &xen_netbk[group];
>>>> +  while (1) {
>>>> +          wait_event_interruptible(netbk->netbk_action_wq,
>>>> +                          rx_work_todo(group) +                           
>>>> || tx_work_todo(group));
>>>> +          cond_resched(); +
>>>> +          if (rx_work_todo(group))
>>>> +                  net_rx_action(group);
>>>> +
>>>> +          if (tx_work_todo(group))
>>>> +                  net_tx_action(group);
>>>> +  }
>>> Hmm... You use kthread_stop() on this thread, so you should probably
>>> test kthread_should_stop() every so often.
>> OK, I will modify it.
> Thanks.
> 
> Steven.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.