Xen/IA64 interrupt virtualization * Introduction This document targets xen/ia64 developers, providing an design overview of interrupt virtualization. How the guest IOSAPIC looks like and how the machine IOSAPIC is used in hypervisor. * Terminology (Not formal definition, just for better understanding) PIRQ: Physical IRQ generate by partitioned device, vector 0-255 in X86 VIRQ: Dynamic IRQ that is pure virtual. Vector 256-511 in X86 IPI: Inter processor IRQ VIPI: MMIO: Memory Mapped IO Event channel: * Background How Xen/X86 handle callback and event channel: In Xen environment, a para-guest registers its callback/safe callback entry to hypervisor for batch delivering of events to guest. When a guest has pending events(shared bitmap), the guest execution turn to the pre-registered callback function (evtchn_do_upcall) like an interrupt happens on native system. This control transfer can be disabled by another shared variable evtchn_upcall_mask. In this way guest software can disable upcall for some reason. Within evtchn_do_upcall, the events is dispatched. I.e. call do_IRQ() or evtchn_device_upcall(). Current IA64 approach for callback: Current Xen/IA64 is using a pesudo physical IRQ to indicate the active of events and do dispatch at that pseudo IRQ handle. Within Xen summit we all agree to implement the callback/safe fallback mechanism to avoid potentail bugs and Intel is working on that now. How X86 Xenlinux handle IRQs: Guest IRQ including PIRQ, VIRQ, IPI and interdomain communication channel are all bund with event channel. I.e. they all are carried by event channel. At intial time, the guest needs to initialize IO_APIC hardware base on knowledge presented by firmware. And eventually register a pure virtual "pirq_type" as hw_interrupt_type instead of ioapic_level_type and ioapic_edge_type. At run time, "pirq_type" works and do pure event channel based operation. for example, irq_desc->handler->ack (becomes ack_pirq) mask the corresponding event channel (no hypercall). irq_desc->handler->end (becomes end_pirq) unmask the corresponding event channel and may notify xen through hypercall (PHYSDEVOP_IRQ_UNMASK_NOTIFY) to call xen irq_desc->handler->end. The later one may signal "EOI" in hypervisor (In IO_APIC, it is unmask_IO_APIC_irq). Difference between pirq_type and ioapic_level_type/ioapic_edge_type: The initial time of this 2 type are similar, I.e. startup/shutdown, enable/disable are same, both may need to access machine resource. But the runtime service, i.e. ack/end, are quit different. pirq_type mainly access event channel related share memory for mask/unmask, but ioapic_level_type/ioapic_edge_type needs to access machine IOSAPIC resource for example: ack_edge_ioapic_irq and ack_edge_ioapic_vector need to mask APIC reource and ack APIC. Another difference is that with event channel approach, the hw_interrupt_type, i.e. pirq_type, works for both level and edge triggered IRQ. When Xen received PHYSDEVOP_IRQ_UNMASK_NOTIFY (comes from guest pirq_type.end): pirq_guest_unmask() if ( --irq_desc->action->in_flight == 0 ) { irq_desc->handler->end(); // "EOI" } Done; Machine IRQ delivery in Xen/X86 The code flow of xen IRQ delivery (IRQ belongs to guests) A machine IRQ happens -> xen -> do_IRQ() of xen. irq_desc->handler->ack(); // same with Linux, op real resource __do_IRQ_guest() for each bund guest { send_guest_pirq(); irq_desc->action->in_flight++; } Done; send_guest_pirq(): Set pending event channel bit (shared evtchn_pending) in target processor. In SMP system when the target processor is running, a machine IPI will be sent to (evtchn_notify). When xen return to guest Before restore_all_guest, if VCPUINFO_upcall_mask=0, i.e evtchan_upcall_mask = 0 and there is pending event channel, Xen will create a bounce_frame on guest that is similar with exception frame, the guest control then goes to callback entry. *Xen/IA64 IRQ virtualization design 1: Hypervisor owns machine IOSAPIC/LSAPIC exclusively. This makes IRQ sharing between driver domains much easier as there is no contention from domains. 2: Machine IRQ delivery in Xen/IA64: The basic logical is exactly same with Linux/IA64 An IRQ happens -> IVT+0x3000 -> ia64_handle_irq() while (IRQ exist) { vector=CR.IVR; mask IRQ using TPR; __do_IRQ(); unmask IRQ using TPR Issue CR.EOI } A slight difference is __do_IRQ. In linux it calls do_IRQ, because Xen merge do_IRQ and __do_IRQ together and use name __do_IRQ. --- Resue The do_IRQ do followings (API in Xen/arch/x86/irq.c), the code sequence is same and is much detail explained here: desc=&irq_desc[vector]; desc->handler->ack(vector); if ( desc->status & IRQ_GUEST ) { __do_IRQ_guest(vector); } action = desc->action; action->handler(...); desc->handler->end(vector); __do_IRQ_guest(..) is same with X86. ---- Reuse 3: The machine SAPIC operation snapshot: An machine IRQ happens -> while (IRQ exist) { read IVR mask by TPR set event channel; unmask by TPR issue CR.EOI }; Within above sequence IOSAPIC.EOI is not issued, so when the driver domain get active: Mask event channel action to handle the device IRQ. Unmask event channel Notify Xen to Issue IOSAPIC.EOI through PHYSDEVOP_IRQ_UNMASK_NOTIFY hypercall. 4: IRQ prioritize Using event channel priority as guest IRQ priority... What does future patch looks like The most job of this patch will be in initialization APIs like io_apic-xen.c in X86. The runtime features are already there and almost no code change. The current event channel mechanism support SMP host/guest well.