[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VT-d Posted-interrupt (PI) design for XEN




> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Sent: Wednesday, March 04, 2015 11:19 PM
> To: Wu, Feng
> Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@xxxxxxxxxxxxx
> Subject: Re: VT-d Posted-interrupt (PI) design for XEN
> 
> >>> On 04.03.15 at 14:30, <feng.wu@xxxxxxxxx> wrote:
> > - Introduce a new global vector which is used to wake up the HLT'ed vCPU.
> > Currently, there is a global vector 'posted_intr_vector', which is used as
> > the
> > global notification vector for all vCPUs in the system. This vector is
> > stored in
> > VMCS and CPU considers it as a special vector, uses it to notify the related
> > pCPU when an interrupt is recorded in the posted-interrupt descriptor.
> >
> > After having VT-d PI, VT-d engine can issue notification event when the
> > assigned devices issue interrupts. We need add a new global vector to
> > wakeup the HLT'ed vCPU, please refer to the following scenario for the
> > usage of this new global vector:
> >
> > 1. vCPU0 is running on pCPU0
> > 2. vCPU0 is HLT'ed and vCPU1 is currently running on pCPU0
> > 3. An external interrupt from an assigned device occurs for vCPU0, if we
> > still use 'posted_intr_vector' as the notification vector for vCPU0, the
> > notification event for vCPU0 (the event will go to pCPU1) will be consumed
> > by vCPU1 incorrectly. The worst case is that vCPU0 will never be woken up
> > again since the wakeup event for it is always consumed by other vCPUs
> > incorrectly. So we need introduce another global vector, naming
> > 'pi_wakeup_vector'
> > to wake up the HTL'ed vCPU.
> 
> I'm afraid you describe a particular scenario here, but I don't see
> how this is related to the introduction of another global vector:
> Either the current (global) vector is sufficient, or another global
> vector also can't solve your problem. I'm sure I'm missing something
> here, so please be explicit.
> 

In fact, the new global vector is used for the above scenario. Let me
explain this a bit more:

After having VT-d PI, when an external interrupt from an assigned device 
happens,
here is the hardware processing flow:

1. Interrupts happen.
2. Find the associated IRTE.
3. Find the destination vCPU from IRTE (from Posted-interrupt descriptor 
address)
4. Sync the interrupt (stored in IRTE as 'virtual vector') to PIRR fields in 
Posted-interrupt descriptor.
5. If needed (Please refer to the VT-d Spec about the condition of issuing 
Notification Event),
issue notification event to the destination CPU which is store in 
posted-interrupt descriptor as 'NDST'

Back to the above scenario:
1. vCPU0 is running in pCPU0, and the 'NDST' filed of vCPU0's posted-interrupt 
descriptor is pCPU0
2. vCPU0 is HLT'ed and vCPU1 is currently running on pCPU0.
3. An external interrupt from an assigned device happens, the destination of 
this interrupt will be
determined as above flow (IRTE --> posted-interrupt descriptor address/vCPU --> 
notification event to 'NDST'),
If this external interrupt is for vCPU0, the notification event will be 
delivered to pCPU0 since the 'NDST' field
of vCPU0's posted-interrupt descriptor is pCPU0. if we use the current (global) 
vector for the notification event
for vCPU0 in the above case, since the current global vector (notification 
vector) is a particular vector to CPU,
vCPU1 will consume it while vCPU1 is currently running on pCPU0, so we failed 
to wake up the HLT'ed vCPU0.

please refer to Section 29.6 in the Intel SDM about how CPU handles this 
particular vector:
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

After introducing a new global vector naming 'pi_wakeup_vector', before vCPU is 
being HLT'ed, we set
The 'NV' filed (Notification Vector) in the vCPU's posted-interrupt descriptor 
to 'pi_wakeup_vector', and
this is a normal vector to CPU and CPU will not do special things for it 
(different from the current global vector).
In the handler of this vector, we can wake up the HLT'ed vCPU.

> > - Update posted-interrupt descriptor during vCPU scheduling
> > The basic idea here is:
> > 1. When vCPU's state is RUNSTATE_running,
> >         - Set 'NV' to 'posted_intr_vector'.
> >         - Clear 'SN' to accept posted-interrupts.
> >         - Set 'NDST' to the pCPU on which the vCPU will be running.
> >[...]
> 
> This is pretty hard to read without knowing what the abbreviations
> actually stand for, and suggesting to hunt for them in the spec isn't
> very reader friendly either. Please explain these fields, at the very
> least by way of comments on the structure fields presented earlier.
> 

There are some changes to IRTE and posted-interrupt descriptor after
VT-d PI is introduced:
IRTE:
Posted-interrupt Descriptor Address: the address of the posted-interrupt 
descriptor
Virtual Vector: the guest vector of the interrupt
URG: indicates if the interrupt is urgent

Posted-interrupt descriptor:
The Posted Interrupt Descriptor hosts the following fields:
Posted Interrupt Request (PIR): Provide storage for posting (recording) 
interrupts (one bit
per vector, for up to 256 vectors).
Outstanding Notification (ON): Indicate if there is a notification event 
outstanding (not
processed by processor or software) for this Posted Interrupt Descriptor. When 
this field is 0,
hardware modifies it from 0 to 1 when generating a notification event, and the 
entity receiving
the notification event (processor or software) resets it as part of posted 
interrupt processing.
Suppress Notification (SN): Indicate if a notification event is to be 
suppressed (not
generated) for non-urgent interrupt requests (interrupts processed through an 
IRTE with
URG=0).
Notification Vector (NV): Specify the vector for notification event (interrupt).
Notification Destination (NDST): Specify the physical APIC-ID of the 
destination logical
processor for the notification event.

> > On Xen side, what is your opinion about support lowest-priority interrupts
> > for VT-d PI?
> 
> I certainly think (as with every other virtualized piece of hardware)
> that hardware behavior should be emulated as closely as possible.
> I.e. yes, we should have it eventually. As to the two stage approach
> mentioned for KVM - I've grown reservations against Intel people
> making promises towards future implementation of something, i.e.
> I'm kind of hesitant to agree to such an implementation model. Yet
> you're to contribute the patches, and I'm surely not planning to veto
> a stage-1-only implementation as it would be an improvement anyway.
> 

Well, I am okay with doing a full implementation for lowest-priority. KVM people
trends to do simple things at the first stage of hardware enabling, if you don't
like do it this way, I will skip the stage 1 above and implement the full 
solution
directly on XEN side.

Thanks,
Feng

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.