[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel][Pv-ops][PATCH 0/4 v4] Netback multiple threads support



Thank you for your acknowledgement.

Regards,
Dongxiao

Steven Smith wrote:
>> Hi Steven and Jan,
>> 
>> I modified the code according to your comments, and the latest
>> version is version 4.  Do you have further comments or consideration
>> on this version?
> No, that all looks fine to me.
> 
> Sorry about the delay in replying; I thought I'd already responded,
> but I seem to have dropped it on the floor somewhere.
> 
> Steven.
> 
> 
>> Xu, Dongxiao wrote:
>>> Hi,
>>> 
>>> Do you have comments on this version of patch?
>>> 
>>> Thanks,
>>> Dongxiao
>>> 
>>> Xu, Dongxiao wrote:
>>>> This is netback multithread support patchset version 4.
>>>> 
>>>> Main Changes from v3:
>>>> 1. Patchset is against xen/next tree.
>>>> 2. Merge group and idx into netif->mapping.
>>>> 3. Use vmalloc to allocate netbk structures.
>>>> 
>>>> Main Changes from v2:
>>>> 1. Merge "group" and "idx" into "netif->mapping", therefore
>>>> page_ext is not used now.
>>>> 2. Put netbk_add_netif() and netbk_remove_netif() into
>>>> __netif_up() and __netif_down().
>>>> 3. Change the usage of kthread_should_stop().
>>>> 4. Use __get_free_pages() to replace kzalloc().
>>>> 5. Modify the changes to netif_be_dbg().
>>>> 6. Use MODPARM_netback_kthread to determine whether using
>>>> tasklet or kernel thread.
>>>> 7. Put small fields in the front, and large arrays in the end of
>>>> struct xen_netbk. 
>>>> 8. Add more checks in netif_page_release().
>>>> 
>>>> Current netback uses one pair of tasklets for Tx/Rx data
>>>> transaction. Netback tasklet could only run at one CPU at a time,
>>>> and it is used to serve all the netfronts. Therefore it has become
>>>> a performance bottle neck. This patch is to use multiple tasklet
>>>> pairs to replace the current single pair in dom0.
>>>> 
>>>> Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of
>>>> tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of
>>>> tasklets serve specific group of netfronts. Also for those global
>>>> and static variables, we duplicated them for each group in order
>>>> to avoid the spinlock. 
>>>> 
>>>> PATCH 01: Generilize static/global variables into 'struct
>>>> xen_netbk'. 
>>>> 
>>>> PATCH 02: Introduce a new struct type page_ext.
>>>> 
>>>> PATCH 03: Multiple tasklets support.
>>>> 
>>>> PATCH 04: Use Kernel thread to replace the tasklet.
>>>> 
>>>> Recently I re-tested the patchset with Intel 10G multi-queue NIC
>>>> device, and use 10 outside 1G NICs to do netperf tests with that
>>>> 10G NIC. 
>>>> 
>>>> Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU.
>>>> With the patchset, the performance is 2x of the original
>>>> throughput. 
>>>> 
>>>> Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs.
>>>> With the patchset, the performance is 3.7x of the original
>>>> throughput. 
>>>> 
>>>> when we test this patch, we found that the domain_lock in grant
>>>> table operation (gnttab_copy()) becomes a bottle neck. We
>>>> temporarily remove the global domain_lock to achieve good
>>>> performance. 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.