[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 0/4] mitigate the per-pCPU blocking list may be too long

To: Chao Gao <chao.gao@xxxxxxxxx>, <xen-devel@xxxxxxxxxxxxx>
From: George Dunlap <george.dunlap@xxxxxxxxxx>
Date: Wed, 26 Apr 2017 17:39:57 +0100
Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>
Delivery-date: Wed, 26 Apr 2017 16:40:44 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 26/04/17 01:52, Chao Gao wrote:
> VT-d PI introduces a per-pCPU blocking list to track the blocked vCPU
> running on the pCPU. Theoretically, there are 32K domain on single
> host, 128 vCPUs per domain. If all vCPUs are blocked on the same pCPU,
> 4M vCPUs are in the same list. Travelling this issue consumes too
> much time. We have discussed this issue in [1,2,3].
> 
> To mitigate this issue, we proposed the following two method [3]:
> 1. Evenly distributing all the blocked vCPUs among all pCPUs.

So you're not actually distributing the *vcpus* among the pcpus (which
would imply some interaction with the scheduler); you're distributing
the vcpu PI wake-up interrupt between pcpus.  Is that right?

Doesn't having a PI on a different pcpu than where the vcpu is running
mean at least one IPI to wake up that vcpu?  If so, aren't we imposing a
constant overhead on basically every single interrupt, as well as
increasing the IPI traffic, in order to avoid a highly unlikely
theoretical corner case?

A general maxim in OS development is "Make the common case fast, and the
uncommon case correct."  It seems like it would be better in the common
case to have the PI vectors on the pcpu on which the vcpu is running,
and only implement the balancing when the list starts to get too long.

What do you think?

> 2. Don't put the blocked vCPUs which won't be woken by the wakeup
> interrupt into the per-pCPU list.
> 
> PATCH 1/4 tracks the event, adding entry to PI blocking list. With the
> patch, some data can be acquired to help to validate the following
> patches. 
> 
> Patch 2/4 randomly distritbutes entries (vCPUs) among all oneline
> pCPUs, which can theoretically decrease the maximum of #entry
> in the list by N times. N is #pCPU.
> 
> Patch 3/4 adds a refcount to vcpu's pi_desc. If the pi_desc is
> recorded in one IRTE, the refcount increase by 1 and If the pi_desc is
> cleared in one IRTE, the refcount decrease by 1.
> 
> In Patch 4/4, one vCPU is added to PI blocking list only if its
> pi_desc is referred by at least one IRTE.
> 
> I tested this series in the following scene:
> * One 128 vCPUs guest and assign a NIC to it
> * all 128 vCPUs are pinned to one pCPU.
> * use xentrace to collect events for 5 minutes
> 
> I compared the maximum of #entry in one list and #event (adding entry to
> PI blocking list) with and without the three latter patches. Here
> is the result:
> -------------------------------------------------------------
> |               |                      |                    |
> |    Items      |   Maximum of #entry  |      #event        |
> |               |                      |                    |
> -------------------------------------------------------------
> |               |                      |                    |
> |W/ the patches |         6            |       22740        |
> |               |                      |                    |
> -------------------------------------------------------------
> |               |                      |                    |
> |W/O the patches|        128           |       46481        |
> |               |                      |                    |
> -------------------------------------------------------------

Any chance you could trace how long the list traversal took?  It would
be good for future reference to have an idea what kinds of timescales
we're talking about.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 0/4] mitigate the per-pCPU blocking list may be too long
  - From: Chao Gao

References:
- [Xen-devel] [PATCH 0/4] mitigate the per-pCPU blocking list may be too long
  - From: Chao Gao

Prev by Date: [Xen-devel] [PATCH for-next v3 11/12] x86/pv/domain: clean up setup_compat_l4
Next by Date: Re: [Xen-devel] [PATCH 1/2] xen/arm, arm64: fix xen_dma_ops after 815dd18 "Consolidate get_dma_ops..."
Previous by thread: Re: [Xen-devel] [PATCH 0/4] mitigate the per-pCPU blocking list may be too long
Next by thread: Re: [Xen-devel] [PATCH 0/4] mitigate the per-pCPU blocking list may be too long
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.