[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Enabling VT-d PI by default



> From: Gao, Chao
> Sent: Monday, April 17, 2017 4:14 AM
> 
> On Tue, Apr 11, 2017 at 02:21:07AM -0600, Jan Beulich wrote:
> >>>> On 11.04.17 at 02:59, <chao.gao@xxxxxxxxx> wrote:
> >> As you know, with VT-d PI enabled, hardware can directly deliver external
> >> interrupts to guest without any VMM intervention. It will reduces overall
> >> interrupt latency to guest and reduces overheads otherwise incurred by
> the
> >> VMM for virtualizing interrupts. In my mind, it's an important feature to
> >> interrupt virtualization.
> >>
> >> But VT-d PI feature is disabled by default on Xen for some corner
> >> cases and bugs. Based on Feng's work, we have fixed those corner
> >> cases related to VT-d PI. Do you think it is a time to enable VT-d PI by
> >> default. If no, could you list your concerns so that we can resolve them?
> >
> >I don't recall you addressing the main issue (blocked vCPU-s list
> >length; see the comment next to the iommu_intpost definition).
> >
> 
> Indeed. I have gone through the discussion happened in April 2016[1, 2].
> [1] https://lists.gt.net/xen/devel/422661?search_string=VT-d%20posted-
> interrupt%20core%20logic%20handling;#422661
> [2]
> https://lists.gt.net/xen/devel/422567?search_string=%20The%20length%20o
> f%20the%20list%20depends;#422567.
> 
> First of all, I admit this is an issue in extreme case and we should
> come up with a solution.
> 
> The problem we are facing is:
> There is a per-cpu list used to maintain all the blocked vCPU on a
> pCPU.  When a wakeup interrupt comes, the interrupt handler travels
> the list to wake the vCPUs whose pi_desc indicates an interrupt has
> been posted.  There is no policy to restrict the size of the list such
> that in some extreme case, the list can be too long to cause some
> issues (the most obvious issue is  about interrupt latency).
> 
> The theoretical max number of entry in the list is 4M as one host can
> have 32k domains and every domain can have 128vCPU. If all the vCPUs
> are blocked in one list, the list gets its theoretical maximum.
> 
> The root cause of this issue, I think, is that the wakeup interrupt
> vector is shared by all the vCPUs on one pCPU. Lacking of enough
> information (such as which device sends or which IRTE translates this
> interrupt), there is no effective method to distinguish the
> interrupt's destination vCPU except traveling this list. Right?  So we
> only can mitigate this issue through decreasing or limiting the
> entry's maximum in one list.
> 
> Several methods we can take to mitigate this issue:
> 1. According to your discussions, evenly distributing all the blocked
> vCPUs among all pCPUs can mitigate this issue. With this approach, all
> vCPUs are blocked in one list can be avoided. It can decrease the
> entry's maximum in one list by N times (N is the number of pCPU).
> 
> 2. Don't put the blocked vCPUs which won't be woken by the wakeup
> interrupt into the per-cpu list. Currently, we put the blocked vCPUs
> belong to domains who have assigned devices into the list. But if one
> blocked vCPU of such domain is not a destination of every posted
> format IRTE, it needn't be added to the per-cpu list. The blocked vCPU
> will be woken by IPIs or other virtual interrupts. From this aspect, we
> can decrease the entries in the per-cpu list.
> 
> 3. Like what we do in struct irq_guest_action_t, can we limit the
> maximum of entry we support in the list. With this approach, during
> domain creation, we calculate the available entries and compare with
> the domain's vCPU number to decide whether the domain can use VT-d PI.

VT-d PI is global instead of per-domain. I guess you actually mean
failing device assignment operation if counting new domain's #VCPUs
exceeds the limitation.

> This method will pose a strict restriction to the maximum of entry in
> one list. But it may affect vCPU hotplug.
> 
> According to your intuition, which methods are feasible and
> acceptable? I will attempt to mitigate this issue per your advices.
> 

My understanding is that we need them all. #1 is the baseline,
with #2/#3 as further optimization. :-)

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.