This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Home Products Support Community News


RE: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Do

To: "Guy Zana" <guy@xxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
Subject: RE: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
Date: Fri, 10 Aug 2007 10:58:30 +0800
Cc: Alex Novik <alex@xxxxxxxxxxxx>
Delivery-date: Thu, 09 Aug 2007 19:58:59 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxx
In-reply-to: <9392A06CB0FDC847B3A530B3DC174E7B0327CC08@xxxxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <9392A06CB0FDC847B3A530B3DC174E7B0327CC08@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
Thread-index: AcfarSF45lhECFFTQSiE3oBAG7IKmQASGmqA
Thread-topic: [Xen-devel] [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
Hi, Guy,
        Thanks for very good description.

        Basically I think this should work, but with following concerns:

- How to choose the timeout value?
        Small timeout may result more spurious injection and performance 
penalty, while large timeout may not satisfy driver expectation to high-speed 

- How to cope with existing irq sharing mechanism for PV driver domain?
        Existing approach between PV driver domain and dom0 is based 
on some trigger point, i.e, guest EOI. Keep insertion count and track 
guest response. Timeout mechanism is different, and I guess two paths 
are difficult to share logic.

        How about a mixed sharing case, say among dom0/PV domain/
HVM domain?

- interrupt delay within HVM may be exaggerated under some special 
condition, if HVM is not ready to handle the injection at D.3 (like blocked 
in I/O emulation) while later D.4 will cancel previous injection at next 
timeout. Then only at next D.3 HVM gets re-injection again and it may 
or may not be delayed again upon status at that time.

        Did you run some heavy workload and observe any complains?

        But, anyway, I think timeout is the only way to support sharing irq 
case (if without MSI and we do want to allow it), though with 
performance issue. :-)


>From: Guy Zana
>Sent: 2007年8月10日 1:46
>We propose the following method in order to support interdomain
>interrupt sharing, where one of the domains is an HVM assigned with a
>pass-through device. This method is limited in a way that we can support
>sharing between just two domains: dom0 and an HVM. This method is
>on changing polarity.
>Change polarity algorithm (CPA) - Algorithm when polarity inversion is
>used for the EOI recognition. For details see
>PLINE - Physical Line. This is the reflection of the physical line. By
>changing polarity we know what is the physical line's status.
>VLINE - Virtual Line. This is the HVM virtual line.
>PT Device - A pass-through PCI device assigned to the HVM.
>Dom0 Device - A PCI device assigned to dom0 (by default).
>Interrupt Sharing - Determined by two or more PCI devices, which their's
>intx lines are connected to the same IOAPIC's pin (OR wired), and
>assigned to different domains.
>Re-occurring interrupts - The pline is held asserted while the IOAPIC
>fire interrupts continuously.
>Spurious interrupts - Whitin a domain context, an interrupt that passed
>the ISR chain without handling.
>NOTE: A single PCI device can not be assigned to more than one
>When a single device is assigned to an HVM, using CPA, we update the
>HVM's VLINE according to the PLINE state (both hold the same value)
>providing complete reflection. It is trivial to see how more than one
>device that shares the same line could be assigned to the HVM (using
>same CPA).
>In general, we should consider the situation were N devices from Dom0
>shares the same line with M devices from HVM. There are 3 cases
>1. N=0, i.e. this line belongs to HVM devices. This case is already
>solved with CPA.
>2. M=0, i.e. this line belongs to Dom0 devices. This is basic dom0
>3. N != 0, M != 0. This is the situation that we want to handle now,
>from now on we'll refer to this situation as interdomain shared
>Although, our method could be extended to contain handling for all of
>the above cases.
>Problems related to Interdomain Interrupt Sharing
>* Spurious interrupts.
>* Interrupt starvation.
>* When we use CPA, we are not getting re-occurring interrupts, this
>should be taken into account.
>* Even if a shared interrupt was handled by a domain specific ISR, it is
>not guaranteed that the pline will be deasserted.
>* Interrupt storming - _Physical_ storming is solved transparently by
>* Letting both the HVM and DOM0 a chance to handle the interrupt
>* Update the HVM's VLINE correctly when sharing an interrupt
>* Avoid spurious interrupts or at least minimize the number of such
>interrupts injected into HVM.
>* Stay with a reasonable interrupt latency.
>Proffered Method
>1. We gain shared line assertion state by using CPA, at an
>assert/deassert event we save the line's state.
>2. We perform most of the logic in a periodic timer module.
>1. Timer module. Periodic callback that does all the logic processing.
>2. XEN interrupt handler. Handler is replaced by CPA that updates
>3. Dom0 ISR chain. At the end of the chain, we know whether the
>interrupt was handled or not, and update the status in Xen using a
>1. Idle. The PLINE is deasserted. This is "relax state". We're awaiting
>the interrupt to come.
>2. In Dom0. The interrupt is currently handled by Dom0. The event was
>sent into Dom0 and Dom0 ISR is processing it.
>3. Process Interrupt. The interrupt was handled by Dom0. Dom0 got
>to us with the results of the handling. Now we need to decide what to do
>next. This state can be reached only from state [2].
>State machine
>The timer callback implement the state machine, it freezes when we are
>in the idle state.
>The "events" described below are polled by the timer. We also perform
>changes in dom0's ISR chain in order to generate these "events".
>The following events are handled:
>A. PLINE is deasserted. This event will move state machine to _Idle_
>state from any state.
>This can happen in one of 2 cases:
>1. Initialization.
>2. As a result of PLINE deassertion. If PLINE went down, it means that
>we're done.
>B. Idle state and PLINE is asserted. In this case the interrupt is
>injected into DOM0. The state machine moves to "In Dom0". We always
>firstly let domain0 try to handle the interrupt, thus logically creating
>an interdomain ISR chain beginning with dom0.
>C. "In Dom0" and PLINE status is asserted (We read the status from a
>timer). Do nothing. We don't know what to do with this interrupt yet.
>D. "Process Interrupt" and PLINE is asserted.
>Few cases are possible:
>1. If Dom0 successfully handled the last interrupt and the interrupt
>wasn't injected into the HVM, inject the interrupt into Dom0 and move to
>state "In Dom0". This is the Dom0 interrupt, keep injecting into Dom0.
>2. If Dom0 successfully handled the last interrupt and the interrupt was
>injected into the HVM, deassert the HVM vline, and re-inject the
>interrupt into Dom0. Move to state "In Dom0".
>(This is done in order to solve a case where the HVM was handling the
>interrupt, but the line didn't get deasserted because a Dom0 device
>asserted it before the a PT device deasserted it (as result of the HVM
>handling). In this case we assume that the HVM is done with it and now
>it's Dom0's turn.)
>3. If Dom0 didn't successfully handle the last interrupt and the
>interrupt was not injected into the HVM, inject the interrupt into the
>HVM and stay in the same state. This is an HVM's interrupt. Dom0
>rejected it.
>4. If Dom0 didn't successfully handle the last interrupt and the
>interrupt was injected into HVM, inject interrupt into Dom0 and move to
>state "in Dom0". HVM is not done yet with current interrupt.
>E."Process Interrupt" and the PLINE is deasserted,- deassert the HVM
>interrupt(if neccesary) and move to idle. We handled the interrupt.
>Prepare ourselves for the new one.
>The main idea here is to inject the interrupt into Dom0 when we don't
>know what to do with it. If Dom0 takes the ownership, then let it handle
>the interrupt. If not, we inject it into the HVM. We recognize that all
>of the PT devices are not asserting the line by PLINE deassertion or by
>Dom0 taking the ownership back to it.
>Any ideas and comments are welcome.
>Best regards,
>Alex Novik,
>Xen-devel mailing list

Xen-devel mailing list