Xen project Mailing List

[Xen-devel] [Xenhackthon] Virtualized APIC registers - virtual interrupt delivery.

During the hackothon we chatted about the Intel APIC virtualization and how it works with current Linux PVHVM. Or rather how it is not per my understanding. I am trying to visualize how this would work with a 10GB NIC that is passed in a guest. This slide (starting at pg 6) gives an idea of what it is: http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-virt-intel-vt-feat-nakajima.pdf pg9 goes in details of what this does - APIC reads don't trap, but writes do cause an VMEXIT. OK, that is similar to how PVHVM callback vector + events work. If the NIC has an vector, it hits the hypervisor, sets the right event channel. Then the guest is interrupted with vector 0xf3 (callback vector), goes straight to __xen_evtchn_do_upcall and reads the event channel and calls the NIC driver IRQ handler. If it needs to write (say do an IPI or mask an other CPU IRQ) it will do a hypercall and exit (there optimizations to not do this if the masking, etc is done on the local CPU). For that the PVHVM event channel machinery gives the same benefit. The next part is "Virtual-interrupt delivery". Here it says: "CPU delivers virtual interrupts to guest (including virtual IPIs)." Not much on details, but then this slide: http://www.linux-kvm.org/wiki/images/7/70/2012-forum-nakajima_apicv.pdf gives a better idea (page 7 and 8) and then it goes in details. Also the Intel Software Development Manual starting at 29.1 talks in details about it. Per my understanding, the CPU sets the SVI and RVI to tell the hypervisor what vector is currently being execututed and which one is going next. Those vectors are choosen by the OS. It could use vector 0xfa for a NIC driver and a lower one for IPIs or such. The hypervisor sets a VISR (a bitmap) off all the vectors that a guest is allowed to execute without an VMEXIT. In all likehood it will just mask out the vectors it is using and let the guest have a free range. Which means that if this is set to be higher than the hypervisor timer or IPI callback the guest can run unbounded. Also it would seem that this value has to be often reset when migrating a guest between the pCPUs. And it would appear that this value is static. Meaning the guest only sets these vectors once and the hypervisor is responsible for managing the priority of that guest and other guests (say dom0) on the CPU. For example, we have a guest with a 10gB NIC and the guest has decided to use vector 0x80 for it (assume a UP guest). Dom0 has an SAS controller and is using event number 30, 31, 32, and 33 (there are only 4 PCPUS). The hypervisor maps them to be 0x58, 0x68, 0x78 and 0x88 and spreads those vectors on each pCPU. The guest is running on pCPU1 and there are two vectors - 0x80 and 0x58. The one assigned to the guest wins and dom0 SAS controller is preempted. The solution for that seems to have some interaction with the guest when it allocates the vectors so that they are always below the dom0 priority vectors. Or hypervisor has to dynamically shuffle its own vectors to be higher priority. Or is there an guest vector <-> hypervisor vector lookup table that the CPU can use? So the hypervisor can say: the vector 0x80 in the guest actually maps to vector 0x48 in the hypervisor? Now the above example assumed a simple HVM Linux kernel that does not use PV extensions. Currently Linux on HVM will enable the event system and use one vector for a callback (0xf3). For this to work where we mix the event callback and a real physical device vector along with access to the virtual APIC, this would require some knowing of which devices (or vectors) can use the event path or not. Am I on the right track? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.