[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel][Pv-ops][PATCH 0/4 v4] Netback multiple threads support

Hi Steven and Jan,

I modified the code according to your comments, and the latest version is 
version 4. 
Do you have further comments or consideration on this version?


Xu, Dongxiao wrote:
> Hi,
> Do you have comments on this version of patch?
> Thanks,
> Dongxiao
> Xu, Dongxiao wrote:
>> This is netback multithread support patchset version 4.
>> Main Changes from v3:
>> 1. Patchset is against xen/next tree.
>> 2. Merge group and idx into netif->mapping.
>> 3. Use vmalloc to allocate netbk structures.
>> Main Changes from v2:
>> 1. Merge "group" and "idx" into "netif->mapping", therefore
>> page_ext is not used now.
>> 2. Put netbk_add_netif() and netbk_remove_netif() into
>> __netif_up() and __netif_down().
>> 3. Change the usage of kthread_should_stop().
>> 4. Use __get_free_pages() to replace kzalloc().
>> 5. Modify the changes to netif_be_dbg().
>> 6. Use MODPARM_netback_kthread to determine whether using
>> tasklet or kernel thread.
>> 7. Put small fields in the front, and large arrays in the end of
>> struct xen_netbk. 
>> 8. Add more checks in netif_page_release().
>> Current netback uses one pair of tasklets for Tx/Rx data transaction.
>> Netback tasklet could only run at one CPU at a time, and it is used
>> to serve all the netfronts. Therefore it has become a performance
>> bottle neck. This patch is to use multiple tasklet pairs to replace
>> the current single pair in dom0. 
>> Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of
>> tasklets pair (CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets
>> serve specific group of netfronts. Also for those global and static
>> variables, we duplicated them for each group in order to avoid the
>> spinlock. 
>> PATCH 01: Generilize static/global variables into 'struct xen_netbk'.
>> PATCH 02: Introduce a new struct type page_ext.
>> PATCH 03: Multiple tasklets support.
>> PATCH 04: Use Kernel thread to replace the tasklet.
>> Recently I re-tested the patchset with Intel 10G multi-queue NIC
>> device, and use 10 outside 1G NICs to do netperf tests with that 10G
>> NIC. 
>> Case 1: Dom0 has more than 10 vcpus pinned with each physical CPU.
>> With the patchset, the performance is 2x of the original throughput.
>> Case 2: Dom0 has 4 vcpus pinned with 4 physical CPUs.
>> With the patchset, the performance is 3.7x of the original
>> throughput. 
>> when we test this patch, we found that the domain_lock in grant table
>> operation (gnttab_copy()) becomes a bottle neck. We temporarily
>> remove the global domain_lock to achieve good performance.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.