Xen project Mailing List

Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling

To: Ian Campbell <ian.campbell@xxxxxxxxxx>

From: Vijay Kilari <vijay.kilari@xxxxxxxxx>

Date: Fri, 12 Jun 2015 14:07:32 +0530

Cc: manish.jaggi@xxxxxxxxxxxxxxxxxx, Julien Grall <julien.grall@xxxxxxxxxx>, Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Fri, 12 Jun 2015 08:37:40 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, Jun 11, 2015 at 3:10 PM, Ian Campbell <ian.campbell@xxxxxxxxxx> wrote: > Draft F follows. Also at: > http://xenbits.xen.org/people/ianc/vits/draftF.{pdf,html} > > Here's a quick update based on feedback prior to meeting on #xenarm at > 12:00AM BST / 7:00AM EDT / 4:30PM IST (which is ~1:20 from now) > > Ian. > > % Xen on ARM vITS Handling > % Ian Campbell <ian.campbell@xxxxxxxxxx> > % Draft F > > # Changelog > > ## Since Draft E > > * Discussion of `struct pending_irq` > * Fix handling of enable/disable, requiring switching back to trapping > the virtual cfg table again. get_vlpi_cfg is no longer needed. > * Fix p2m_lookup to also use get_page_from_gfn. > > ## Since Draft D > > * Fixed assumptions about vLPI->pLPI mapping, which is not > possible. This lead to changes to the model for enabling and > disabling pLPI and vLPI and the handling of the virtual LPI > configuration table, resolving _Unresolved Issue 1_. > * Made the pLPI and vLPI interrupt priorities explicit. > * Attempted to clarify the trust issues regarding in-guest data > structures. > * Mandate a particular cacheability for tables in guest memory. > > ## Since Draft C > > * _Major_ rework, in an attempt to simplify everything into something > more likely to be achievable for 4.6. > * Made some simplifying assumptions. > * Reduced the scope of some support. > * Command emulation is now mostly trivial. > * Expanded detail on host setup, allowing other assumptions to be > made during emulation. > * Many other things lost in the noise of the above. > > ## Since Draft B > > * Details of command translation (thanks to Julien and Vijay) > * Added background on LPI Translation and Pending tables > * Added background on Collections > * Settled on `N:N` scheme for vITS:pat's mapping. > * Rejigged section nesting a bit. > * Since we now thing translation should be cheap, settle on > translation at scheduling time. > * Lazy `INVALL` and `SYNC` > > ## Since Draft A > > * Added discussion of when/where command translation occurs. > * Contention on scheduler lock, suggestion to use SOFTIRQ. > * Handling of domain shutdown. > * More detailed discussion of multiple vs single vits pros/cons. > > # Introduction > > ARM systems containing a GIC version 3 or later may contain one or > more ITS logical blocks. An ITS is used to route Message Signalled > interrupts from devices into an LPI injection on the processor. > > The following summarises the ITS hardware design and serves as a set > of assumptions for the vITS software design. For full details of the > ITS see the "GIC Architecture Specification". > > ## Locality-specific Peripheral Interrupts (`LPI`) > > This is a new class of message signalled interrupts introduced in > GICv3. They occupy the interrupt ID space from `8192..(2^32)-1`. > > The number of LPIs support by an ITS is exposed via > `GITS_TYPER.IDbits` (as number of bits - 1), it may be up to > 2^32. _Note_: This field also contains the number of Event IDs > supported by the ITS. > > ### LPI Configuration Table > > Each LPI has an associated configuration byte in the LPI Configuration > Table (managed via the GIC Redistributor and placed at > `GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures: > > * The LPI's priority; > * Whether the LPI is enabled or disabled. > > Software updates the Configuration Table directly but must then issue > an invalidate command (per-device `INV` ITS command, global `INVALL` > ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed > to become visible (possibly requiring an ITS `SYNC` command to ensure > completion of the `INV` or `INVALL`). Note that it is valid for an > implementation to reread the configuration table at any time (IOW it > is _not_ guaranteed that a change to the LPI Configuration Table won't > be visible until an invalidate is issued). > > ### LPI Pending Table > > Each LPI also has an associated bit in the LPI Pending Table (managed > by the GIC redistributor). This bit signals whether the LPI is pending > or not. > > This region may contain out of date information and the mechanism to > synchronise is `IMPLEMENTATION DEFINED`. > > ## Interrupt Translation Service (`ITS`) > > ### Device Identifiers > > Each device using the ITS is associated with a unique "Device > Identifier". > > The device IDs are properties of the implementation and are typically > described via system firmware, e.g. the ACPI IORT table or via device > tree. > > The number of device ids in a system depends on the implementation and > can be discovered via `GITS_TYPER.Devbits`. This field allows an ITS > to have up to 2^32 devices. > > ### Events > > Each device can generate "Events" (called `ID` in the spec) these > correspond to possible interrupt sources in the device (e.g. MSI > offset). > > The maximum number of interrupt sources is device specific. It is > usually discovered either from firmware tables (e.g. DT or ACPI) or > from bus specific mechanisms (e.g. PCI config space). > > The maximum number of events ids support by an ITS is exposed via > `GITS_TYPER.IDbits` (as number of bits - 1), it may be up to > 2^32. _Note_: This field also contains the number of `LPIs` supported > by the ITS. > > ### Interrupt Collections > > Each interrupt is a member of an "Interrupt Collection". This allows > software to manage large numbers of physical interrupts with a small > number of commands rather than issuing one command per interrupt. > > On a system with N processors, the ITS must provide at least N+1 > collections. > > An ITS may support some number of internal collections (indicated by > `GITS_TYPER.HCC`) and external ones which require memory provisioned > by the Operating System via a `GITS_BASERn` register. > > ### Target Addresses > > The Target Address correspond to a specific GIC re-distributor. The > format of this field depends on the value of the `GITS_TYPER.PTA` bit: > > * 1: the base address of the re-distributor target is used > * 0: a unique processor number is used. The mapping between the > processor affinity value (`MPIDR`) and the processor number is > discoverable via `GICR_TYPER.ProcessorNumber`. > > This value is up to the ITS implementer (`GITS_TYPER` is a read-only > register). > > ### Device Table > > A Device Table is configured in each ITS which maps incoming device > identifiers into an ITS Interrupt Translation Table. > > ### Interrupt Translation Table (`ITT`) and Collection Table > > An `Event` generated by a `Device` is translated into an `LPI` via a > per-Device Interrupt Translation Table. The structure of this table is > described in GIC Spec 4.9.12. > > The ITS translation table maps the device id of the originating device > into a physical interrupt (`LPI`) and an Interrupt Collection. > > The Collection is in turn looked up in the Collection Table to produce > a Target Address, indicating a redistributor (AKA CPU) to which the > LPI is delivered. > > ### OS Provisioned Memory Regions > > The ITS hardware design provides mechanisms for an ITS to be provided > with various blocks of memory by the OS for ITS internal use, this > include the per-device ITT (established with `MAPD`) and memory > regions for Device Tables, Virtual Processors and Interrupt > Collections. Up to 8 such regions can be requested by the ITS and > provisioned by the OS via the `GITS_BASERn` registers. > > ### ITS Configuration > > The ITS is configured and managed, including establishing and > configuring the Translation Tables and Collection Table, via an in > memory ring shared between the CPU and the ITS controller. The ring is > managed via the `GITS_CBASER` register and indexed by `GITS_CWRITER` > and `GITS_CREADR` registers. > > A processor adds commands to the shared ring and then updates > `GITS_CWRITER` to make them visible to the ITS controller. > > The ITS controller processes commands from the ring and then updates > `GITS_CREADR` to indicate the the processor that the command has been > processed. > > Commands are processed sequentially. > > Commands sent on the ring include operational commands: > > * Routing interrupts to processors; > * Generating interrupts; > * Clearing the pending state of interrupts; > * Synchronising the command queue > > and maintenance commands: > > * Map device/collection/processor; > * Map virtual interrupt; > * Clean interrupts; > * Discard interrupts; > > The field `GITS_CBASER.Size` encodes the number of 4KB pages minus 0 > consisting of the command queue. This field is 8 bits which means the > maximum size is 2^8 * 4KB = 1MB. Given that each command is 32 bytes, > there is a maximum of 32768 commands in the queue. > > The ITS provides no specific completion notification > mechanism. Completion is monitored by a combination of a `SYNC` > command and either polling `GITS_CREADR` or notification via an > interrupt generated via the `INT` command. > > Note that the interrupt generation via `INT` requires an originating > device ID to be supplied (which is then translated via the ITS into an > LPI). No specific device ID is defined for this purpose and so the OS > software is expected to fabricate one. > > Possible ways of inventing such a device ID are: > > * Enumerate all device ids in the system and pick another one; > * Use a PCI BDF associated with a non-existent device function (such > as an unused one relating to the PCI root-bridge) and translate that > (via firmware tables) into a suitable device id; > * ??? > > # LPI Handling in Xen > > ## IRQ descriptors > > Currently all SGI/PPI/SPI interrupts are covered by a single static > array of `struct irq_desc` with ~1024 entries (the maximum interrupt > number in that set of interrupt types). > > The addition of LPIs in GICv3 means that the largest potential > interrupt specifier is much larger. > > Therefore a second dynamically allocated array will be added to cover > the range `8192..nr_lpis`. The `irq_to_desc` function will determine > which array to use (static `0..1024` or dynamic `8192..end` lpi desc > array) based on the input irq number. Two arrays are used to avoid a > wasteful allocation covering the unused/unusable) `1024..8191` range. > > ## Virtual LPI interrupt injection > > A physical interrupt which is routed to a guest vCPU has the > `_IRQ_GUEST` flag set in the `irq_desc` status mask. Such interrupts > have an associated instance of `struct irq_guest` which contains the > target `struct domain` pointer and virtual interrupt number. > > In Xen a virtual interrupt (either arising from a physical interrupt > or completely virtual) is ultimately injected to a VCPU using the > `vgic_vcpu_inject_irq` function, or `vgic_vcpu_inject_lpi`. > > This mechanism will likely need updating to handle the injection of > virtual LPIs. In particular rather than `GICD_ITARGERRn` or > `GICD_IROUTERn` routing of LPIs is performed via the ITS collections > mechanism. This is discussed below (In _vITS_:_Virtual LPI injection_). > > # Scope > > The ITS is rather complicated, especially when combined with > virtualisation. To simplify things we initially omit the following > functionality: > > - Interrupt -> vCPU -> pCPU affinity. The management of physical vs > virtual Collections is a feature of GICv4, thus is omitted in this > design for GICv3. Physical interrupts which occur on a pCPU where > the target vCPU is not already resident will be forwarded (via IPI) > to the correct pCPU for injection via the existing > `vgic_vcpu_inject_irq` mechanism (extended to handle LPI injection > correctly). > - Clearing of the pending state of an LPI under various circumstances > (`MOVI`, `DISCARD`, `CLEAR` commands) is not done. This will result > in guests seeing some perhaps spurious interrupts. > - vITS functionality will only be available on 64-bit ARM hosts, > avoiding the need to worry about fast access to guest owned data > structures (64-bit uses a direct map). (NB: 32-bit guests on 64-bit > hosts can be considered to have access) > > # pITS > > ## Assumptions > > It is assumed that `GITS_TYPER.IDbits` is large enough that there are > sufficient LPIs available to cover the sum of the number of possible > events generated by each device in the system (that is the sum of the > actual events for each bit of hardware, rather than the notional > per-device maximum from `GITS_TYPER.Idbits`). > > This assumption avoids the need to do memory allocations and interrupt > routing at run time, e.g. during command processing by allowing us to > setup everything up front. > > ## Driver > > The physical driver will provide functions for enabling, disabling > routing etc a specified interrupt, via the usual Xen APIs for doing > such things. > > This will likely involve interacting with the physical ITS command > queue etc. In this document such interactions are considered internal > to the driver (i.e. we care that the API to enable an interrupt > exists, not how it is implemented). > > The physical ITS will be provisioned with whatever tables it requests > via its `GITS_BASERn` registers. > > ## Collections > > The `pITS` will be configured at start of day with 1 Collection mapped > to each physical processor, using the `MAPC` command on the physical > ITS. > > ## Per Device Information > > Each physical device in the system which can be used together with an > ITS (whether using passthrough or not) will have associated with it a > data structure: > > struct its_device { > struct pits *pits; > uintNN_t phys_device_id; > uintNN_t virt_device_id; > unsigned int *events; > unsigned int nr_events; > struct page_info *pitt; > unsigned int nr_pitt_pages; > /* Other fields relating to pITS maintenance but unrelated to vITS */ > }; > > Where: > > - `pits`: Pointer to the associated physical ITS. > - `phys_device_id`: The physical device ID of the physical device > - `virt_device_id`: The virtual device ID if the device is accessible > to a domain > - `events`: An array mapping a per-device event number into a physical > LPI. > - `nr_events`: The number of events which this device is able to > generate. > - `pitt`, `nr_pitt_pages`: Records allocation of pages for physical > ITT (not directly accessible). > > During its lifetime this structure may be referenced by several > different mappings (e.g. physical and virtual device id maps, virtual > collection device id). > > ## Device Discovery/Registration and Configuration > > Per device information will be discovered based on firmware tables (DT > or ACPI) and information provided by dom0 (e.g. reading associated PCI > cfg space, registration via PHYSDEVOP_pci_device_add or new custom > hypercalls). > > This information shall include at least: > > - The Device ID of the device. > - The maximum number of Events which the device is capable of > generating. > > When a device is discovered/registered (i.e. when all necessary > information is available) then: > > - `struct its_device` and the embedded `events` array will be > allocated (the latter with `nr_events` elements). > - The `struct its_device` will be inserted into a mapping (possibly an > R-B tree) from its physical Device ID to the `struct its`. > - `nr_events` physical LPIs will be allocated and recorded in the > `events` array. > - An ITT table will be allocated for the device and the appropriate > `MAPD` command will be issued to the physical ITS. The location will > be recorded in `struct its_device.pitt`. > - Each Event which the device may generate will be mapped to the > corresponding LPI in the `events` array and a collection, by issuing > a series of `MAPVI` commands. Events will be assigned to physical > collections in a round-robin fashion. > > This setup must occur for a given device before any ITS interrupts may > be configured for the device and certainly before a device is passed > through to a guest. This implies that dom0 cannot use MSIs on a PCI > device before having called `PHYSDEVOP_pci_device_add`. > > # Device Assignment > > Each domain will have an associated mapping from virtual device ids > into a data structure describing the physical device, including a > reference to the relevant `struct its_device`. > > The number of possible device IDs may be large so a simple array or > list is likely unsuitable. A tree (e.g. Red-Black may be a suitable > data structure. Currently we do not need to perform lookups in this > tree on any hot paths. > > _Note_: In the context of virtualised device ids (especially for domU) > it may be possible to arrange for the upper bound on the number of > device IDs to be lower allowing a more efficient data structure to be > used. This is left for a future improvement. > > When a device is assigned to a domain (including to domain 0) the > mapping for the new virtual device ID will be entered into the tree. > > During assignment all LPIs associated with the device will be routed > to the guest (i.e. `route_irq_to_guest` will be called for each LPI in > the `struct its_device.events` array) and the pLPI will be enabled in > the physical LPI configuration table with a priority of `GIC_PRI_IRQ` > (not any priority from the guest). > > # vITS > > A guest domain which is allowed to use ITS functionality (i.e. has > been assigned pass-through devices which can generate MSIs) will be > presented with a virtualised ITS. > > Accesses to the vITS registers will trap to Xen and be emulated and a > virtualised Command Queue will be provided. > > Commands entered onto the virtual Command Queue will be translated > into physical commands, as described later in this document. > > There are other aspects to virtualising the ITS (LPI collection > management, assignment of LPI ranges to guests, device > management). However these are only considered here to the extent > needed for describing the vITS emulation. > > ## Xen interaction with guest OS provisioned vITS memory > > Memory which the guest provisions to the vITS (ITT via `MAPD` or other > tables via `GITS_BASERn`) needs careful handling in Xen. > > ### Trust > > Since Xen cannot trust data in data structures contained in such > memory if a guest can trample over it at will. Therefore Xen either > must take great care when accessing data structures stored in such > memory to validate the contents e.g. not trust that values are within > the required limits or it must take steps to restrict guest access to > the memory when it is provisioned. Since the data structures are > simple and most accessors need to do bounds check anyway it is > considered sufficient to simply do the necessary checks on access. > > **Any information read memory which has been provisioned by the guest > OS should not be trusted and must be carefully checked (e.g. ranges > etc) before use.** > > ### Mapping > > Most data structures stored in this shared memory are accessed on the > hot interrupt injection path and must therefore be quickly accessible > from within Xen. Since we have restricted vits support to 64-bit hosts > only `map_domain_page` is fast enough to be used on the fly and > therefore we do not need to be concerned about unbounded amounts of > permanently mapped memory consumed by each `MAPD` command. > > Although `map_domain_page` is fast, `p2m_lookup` (translation from IPA > to PA) is not necessarily so. For now we accept this, as a future > extension a sparse mapping of the guest device table in vmap space > could be considered, with limits on the total amount of vmap space which > we allow each domain to consume. > > The `GITS_BASERn` registers allow for the guest to specify cache > attributes for the memory. For now we require that these have the same > attributes as hypercall arguments in general (see `public/arch-arm.h`) > > In addition while `GITS_BASERn` allows the Cacheability to be > specified as `Device-nGnRnE` we require that the tables provided be in > normal guest RAM (not MMIO, not granted memory etc), that is it must > have type `p2m_ram_rw`. > > ## vITS properties > > The vITS implementation shall have: > > - `GITS_TYPER.HCC == nr_vcpus + 1`. > - `GITS_TYPER.PTA == 0`. Target addresses are linear processor numbers. > - `GITS_TYPER.Devbits == See below`. > - `GITS_TYPER.IDbits == See below`. > - `GITS_TYPER.ITT Entry Size == 7`, meaning 8 bytes, which is the size > of `struct vitt` (defined below). > > `GITS_TYPER.Devbits` and `GITS_TYPER.Idbits` will need to be chosen to > reflect the host and guest configurations (number of LPIs, maximum > device ID etc). > > Other fields (not mentioned here) will be set to some sensible (or > mandated) value. > > The `GITS_BASER0` will be setup to request sufficient memory for a > device table consisting of entries of: > > struct vdevice_table { > uint64_t vitt_ipa; > uint32_t vitt_size; > uint32_t padding; > }; How about adding valid bit to know if the entry is valid or not? > BUILD_BUG_ON(sizeof(struct vdevice_table) != 16); > > On write to `GITS_BASER0` the relevant details of the Device Table > (IPA, size, cache attributes to use when mapping) will be recorded in > `struct domain`. > > All other `GITS_BASERn.Valid == 0`. > > ## vITS to pITS mapping > > A physical system may have multiple physical ITSs. > > With the simplified vits command model presented here only a single > `vits` is required. > > In the future a more complex arrangement may be desired. Since the > choice of model is internal to the hypervisor/tools and is > communicated to the guest via firmware tables we are not tied to this > model as an ABI if we decide to change. > > When constructing dom0 it will therefore be necessary to rewrite any > DTS properties which refer to an ITS to point to the single provided > ITS, as well as dropping all ITS nodes and replacing them with a > single node representing the vITS. > > ## Mapping from `vLPI` back to `pLPI` > > While we have arranged for a (`pDevice`,`pEvent`) to map to a single > `pLPI` we cannot guarantee that a given `vLPI` is mapped by a single > (`vDevice`,`vEvent`) since the guest may setup multiple ITT tables > such that this is not the case. Enforcing that this is the case is > prohibitively expensive. > > Therefore it is not in general possible to associate a `vLPI` with a > `pLPI`. > > ## Per-domain `struct pending_irq` for `vLPI`s > > Internally Xen uses a `struct pending_irq` to track the status of any > pending virtual IRQ, including a virtual LPI. > > Upon domain creation an array of such `struct pending_irq`'s will be > allocated to cover the range `8192..nr_lpis` (for the number of LPIs > which the guest is configured with) and a pointer this array will be > stored in the `struct domain`. The function `irq_to_pending` will be > modified to lookup interupts in the LPI range in this array. > > ## Handling of unrouted/spurious LPIs > > Since there is no 1:1 link between a `vLPI` and `pLPI` enabling and > disabling of phyiscal LPIs cannot be driven from the state of an > associated vLPI. > > Each `pLPI` is routed and enabled during device assignment, therefore > it is possible to receive a physical LPI which has yet to be routed > (via a `vITS`) to a `vLPI`. Why do we need to enable LPIs during device assignment? Can't we do it only on LPI configuration update, which is trapped in Xen as mentioned in 7.8? ( ## Enabling and disabling LPIs) > > Similarly if a guest routes multiple Events to a single `vLPI` the > interrupt may already be pending when we attempt to deliver it. > > Such `pLPI`s shall be ignored and left in the priority dropped state > (per the read from `GICC_IAR`). They will not be `EOI`-d in order to > avoid a possible interrupt storm. > > On device deassignment (including as part of domain destroy) after > resetting the device it will be necessary to EOI any interrupts in > such a state by walking over all events in the corresponding `struct > its_device`. > > ## Enabling and disabling LPIs > > Two new functions `vgic_enable_lpi` and `vgic_disable_lpi` will be > provided which are analogous to `vgic_enable_irqs` and > `vgic_disable_irqs` but work for the LPI interface. (Alternatively, > refactoring the existing functions to work for all caes would be > acceptable too). > > A `vLPI` which has not yet be enabled will automatically be queued, by > the existing vgic injection machinery, until a call to > `vgic_enable_lpi` is made (in response to a trapped access to the > virtual cfg table). > > ## LPI Configuration Table Virtualisation > > A guest's write accesses to its LPI Configuration Table (which is just > an area of guest RAM which the guest has nominated) will be trapped to > the hypervisor, using stage 2 MMU permissions, in order for changes to > be propagated into the host interrupt configuration. > > On write `bit[0]` of the written byte is the enable/disable state for > the irq and is handled thus, for each byte in the written value: > > lpi = lpi correspoding to byte offset (addr - table_base); > > pending_irq = irq_to_pending(lpi); > pending_irq->priority = byte & 0xfc; /* XXX: or byte >> 2 */ > > if ( byte & 0x1 ) > vgic_enable_lpi(current, lpi); > else > vgic_disable_lpi(current, lpi); > > Note that physical interrupts are always configured with a priority of > `GIC_PRI_IRQ`, regardless of the priority of any virtual interrupt. > > ## LPI Pending Table Virtualisation > > According to GIC spec 4.8.5 this table is not necessarily in sync and > the mechanism to force a sync is `IMPLEMENTATION DEFINED`, hence we > don't need to do anything. > > ## Device Table Virtualisation > > The IPA, size and cacheability attributes of the guest device table > will be recorded in `struct domain` upon write to `GITS_BASER0`. > > In order to lookup an entry for `device`: > > define {get,set}_vdevice_entry(domain, device, struct device_table > *entry): > offset = device*sizeof(struct vdevice_table) > if offset > <DT size>: error > > dt_entry = <DT base IPA> + device*sizeof(struct vdevice_table) > paddr = p2m_lookup(domain, dt_entry, p2m_ram) > page = get_page_from_gfn(current->domain, paddr>>PAGE_SHIFT, &p2mt, > P2M_ALLOC); > if !page: error > if !page_is_ram(p2mt): put_page(page); error; > > dt_mapping = map_domain_page(page) > > if (set) > dt_mapping[<appropriate page offset from device>] = *entry; > else > *entry = dt_mapping[<appropriate page offset>]; > > unmap_domain_page(dt_mapping) > put_page(page) > > Since everything is based upon IPA (guest addresses) a malicious guest > can only reference its own RAM here. > > ## ITT Virtualisation > > The location of a VITS will have been recorded in the domain Device > Table by a `MAPI` or `MAPVI` command and is looked up as above. > > The `vitt` is a `struct vitt`: > > struct vitt { > uint16_t valid:1; > uint16_t pad:15; > uint16_t collection; > uint32_t vlpi; > }; > BUILD_BUG_ON(sizeof(struct vitt) != 8); > > A lookup occurs similar to for a device table, the offset is range > checked against the `vitt_size` from the device table. To lookup > `event` on `device`: > > define {get,set}_vitt_entry(domain, device, event, struct vitt *entry): > get_vdevice_entry(domain, device, &dt) > > offset = event*sizeof(struct vitt); > if offset > dt->vitt_size: error > > vitt_entry = dt->vita_ipa + event*sizeof(struct vitt) > paddr = p2m_lookup(domain, vitt_entry, p2m_ram) > page = get_page_from_gfn(current->domain, paddr>>PAGE_SHIFT, &p2mt, > P2M_ALLOC); > if !page: error > if !page_is_ram(p2mt): put_page(page); error; > > vitt_mapping = map_domain_page(page) > > if (set) > vitt_mapping[<appropriate page offset from event>] = *entry; > else > *entry = vitt_mapping[<appropriate page offset>]; > > unmap_domain_page(entry) > put_page(page) > > Again since this is IPA based a malicious guest can only point things > to its own ram. > > ## Collection Table Virtualisation > > A pointer to a dynamically allocated array `its_collections` mapping > collection ID to vcpu ID will be added to `struct domain`. The array > shall have `nr_vcpus + 1` entries and resets to ~0 (or another > explicitly invalid vpcu nr). > > ## Virtual LPI injection > > As discussed above the `vgic_vcpu_inject_irq` functionality will need > to be extended to cover this new case, most likely via a new > `vgic_vcpu_inject_lpi` frontend function. `vgic_vcpu_inject_irq` will > also require some refactoring to allow the priority to be passed in > from the caller (since `LPI` proprity comes from the `LPI` CFG table, > while `SPI` and `PPI` priority is configured via other means). > > `vgic_vcpu_inject_lpi` receives a `struct domain *` and a virtual > interrupt number (corresponding to a vLPI) and needs to figure out > which vcpu this should map to. > > To do this it must look up the Collection ID associated (via the vITS) > with that LPI. > > Proposal: Add a new `its_device` field to `struct irq_guest`, a > pointer to the associated `struct its_device`. The existing `struct > irq_guest.virq` field contains the event ID (perhaps use a `union` > to give a more appropriate name) and _not_ the virtual LPI. Injection > then consists of: > > d = irq_guest->domain > virq = irq_guest->virq > its_device = irq_guest->its_device > > get_vitt_entry(d, its_device->virt_device_id, virq, &vitt) > vcpu = d->its_collections[vitt.collection] > > if !is_valid_lpi(vitt.vlpi): error > > vgic_vcpu_inject_lpi(&d->vcpus[vcpu], vitt.vlpi) > > If the LPI is currently disabled then it will be queued by > `vgic_vcpu_inject_lpi` and injected in response to a subsequent > `vgic_enable_lpi` call. > > ## Command Queue Virtualisation > > The command translation/emulation in this design has been arranged to > be as cheap as possible (e.g. in many cases the actions are NOPs), > avoiding previous concerns about the length of time which an emulated > write to a `CWRITER` register may block the vcpu. > > The vits will simply track its reader and writer pointers. On write > to `CWRITER` it will immediately and synchronously process all > commands in the queue and update its state accordingly. > > It might be possible to implement a rudimentary form of preemption by > periodically (as determined by `hypercall_preempt_check()`) returning > to the guest without incrementing PC but with updated internal > `CREADR` state, meaning it will reexecute the write to `CWRITER` and > we can pickup where we left off for another iteration. This at least > lets us schedule other vcpus etc and prevents a monopoly. > > ## ITS Command Translation > > This section is based on the section 5.13 of GICv3 specification > (PRD03-GENC-010745 24.0) and provides concrete ideas of how this can > be interpreted for Xen. > > The ITS provides 12 commands in order to manage interrupt collections, > devices and interrupts. Possible command parameters are: > > - Device ID (`Device`) (called `device` in the spec). > - Event ID (`Event`) (called `ID` in the spec). This is an index into > a devices `ITT`. > - Collection ID (`Collection`) (called `collection` in the spec) > - LPI ID (`LPI`) (called `pID` in the spec) > - Target Address (`TA`) (called `TA` in the spec`) > > These parameters need to be validated and translated from Virtual (`v` > prefix) to Physical (`p` prefix). > > Note, we differ from the naming in the GIC spec for clarity, in > particular we use `Event` not `ID` and `LPI` not `pID` to reduce > confusion, especially when `v` and `p` suffixes are used due to > virtualisation. > > ### Parameter Validation / Translation > > Each command contains parameters that needs to be validated before any > usage in Xen or passing to the hardware. > > #### Device ID (`Device`) > > Corresponding ITT obtained by looking up as described above. > > The physical `struct its_device` can be found by looking up in the > domain's device map. > > If lookup fails or the resulting device table entry is invalid then > the Device is invalid. > > #### Event ID (`Event`) > > Validated against emulated `GITS_TYPER.IDbits`. > > It is not necessary to translate a `vEvent`. > > #### LPI (`LPI`) > > Validated against emulated `GITS_TYPER.IDbits`. > > It is not necessary to translate a `vLPI` into a `pLPI` since the > tables all contain `vLPI`. (Translation from `pLPI` to `vLPI` happens > via `struct irq_guest` when we receive the IRQ). > > #### Interrupt Collection (`Collection`) > > The `Collection` is validated against the size of the per-domain > `its_collections` array (i.e. nr_vcpus + 1) and then translated by a > simple lookup in that array. > > vcpu_nr = d->its_collections[Collection] > > A result > `nr_cpus` is invalid > > #### Target Address (`TA`) > > This parameter is used in commands which manage collections. It is a > unique identifier per processor. > > We have chosen to implement `GITS_TYPER.PTA` as 0, hence `vTA` simply > corresponds to the `vcpu_id`, so only needs bounds checking against > `nr_vcpus`. > > ### Commands > > To be read with reference to spec for each command (which includes > error checks etc which are omitted here). > > It is assumed that inputs will be bounds and validity checked as > described above, thus error handling is omitted for brevity (i.e. if > get and/or set fail then so be it). In general invalid commands are > simply ignored. > > #### `MAPD`: Map a physical device to an ITT. > > _Format_: `MAPD Device, Valid, ITT Address, ITT Size`. > > _Spec_: 5.13.11 > > `MAPD` is sent with `Valid` bit set if the mapping is to be added and > reset when mapping is removed. > > When the `Valid` bit is set then the range `ITT Address` to `ITT > Address` + `ITT Size` need not be validated, this is done in > `{get,set}_vdevice_entry` when calling the `p2m_lookup` > function. Validating the memory at `MAPD` time would serve no purpose > since the guest could subsequently balloon it out or grant map over it etc. > > The domain's device table is updated with the provided information. > > The `vitt_mapd` field is set according to the `Valid` flag in the > command: > > dt_entry.vitt_ipa = ITT Address > dt_entry.vitt_size = ITT Size > set_vdevice_entry(current->domain, Device, &dt_entry) > > #### `MAPC`: Map an interrupt collection to a target processor > > _Format_: `MAPC Collection, TA` > > _Spec_: 5.13.12 > > The updated `vTA` (a vcpu number) is recorded in the `its_collections` > array of the domain struct: > > d->its_collections[Collection] = TA > > #### `MAPI`: Map an interrupt to an interrupt collection. > > _Format_: `MAPI Device, LPI, Collection` > > _Spec_: 5.13.13 > > After validation: > > vitt.valid = True > vitt.collection = Collection > vitt.vlpi = LPI > set_vitt_entry(current->domian, Device, LPI, &vitt) > > #### `MAPVI`: Map an input identifier to a physical interrupt and an > interrupt collection. > > Format: `MAPVI Device, Event, LPI, Collection` > > vitt.valid = True > vitt.collection = Collection > vitt.vlpi = LPI > set_vitt_entry(current->odmian, Device, Event, &vitt) > > #### `MOVI`: Redirect interrupt to an interrupt collection > > _Format_: `MOVI Device, Event, Collection` > > _Spec_: 5.13.15 > > get_vitt_entry(current->domain, Device, Event, &vitt) > vitt.collection = Collection > set_vitt_entry(current->domain, Device, Event, &vitt) > > XXX consider helper which sets field without mapping/unmapping > twice. > > This command is supposed to move any pending interrupts associated > with `Event` to the vcpu implied by the new `Collection`, which is > tricky. For now we ignore this requirement (as we do for > `GICD_IROUTERn` and `GICD_TARGETRn` for other interrupt types). > > #### `DISCARD`: Discard interrupt requests > > _Format_: `DISCARD Device, Event` > > _Spec_: 5.13.16 > > get_vitt_entry(current->domain, Device, Event, &vitt) > vitt.valid = False > set_vitt_entry(current->domain, Device, Event, &vitt) > > XXX consider helper which sets field without mapping/unmapping > twice. > > This command is supposed to clear the pending state of any associated > interrupt. This requirement is ignored (guest may see a spurious > interrupt). > > #### `INV`: Clean any caches associated with interrupt > > _Format_: `INV Device, Event` > > _Spec_: 5.13.17 > > Since LPI Configuration table updates are not trapped and the config > is read on use, there is nothing to do here. > > #### `INVALL`: Clean any caches associated with an interrupt collection > > _Format_: `INVALL Collection` > > _Spec_: 5.13.19 > > Since LPI Configuration table updates are not trapped and the config > is read on use, there is nothing to do here. > > #### `INT`: Generate an interrupt > > _Format_: `INT Device, Event` > > _Spec_: 5.13.20 > > The `vitt` entry corresonding to `Device,Event` is looked up and then: > > get_vitt_entry(current->domain, Device, Event, &vitt) > vgic_vcpu_inject_lpi(current->domain, vitt.vlpi) > > __Note_: Where (Device,Event) is real may need consideration of > interactions with real LPIs being delivered: Julien had concerns about > Xen's internal IRQ State tracking. if this is a problem then may need > changes to IRQ state tracking, or to inject as a real IRQ and let > physical IRQ injection handle it, or write to `GICR_SETLPIR`? > > #### `CLEAR`: Clear the pending state of an interrupt > > _Format_: `CLEAR Device, Event` > > _Spec_: 5.13.21 > > Should clear pending state of LPI. Ignore (guest may see a spurious > interrupt). > > #### `SYNC`: Wait for completion of any outstanding ITS actions for collection > > _Format_: `SYNC TA` > > _Spec_: 5.13.22 > > This command can be ignored. > > # GICv4 Direct Interrupt Injection > > GICv4 will directly mark the LPIs pending in the virtual pending table > which is per-redistributor (i.e per-vCPU). > > LPIs will be received by the guest the same way as an SPIs. I.e trap in > IRQ mode then read ICC_IAR1_EL1 (for GICv3). > > Therefore GICv4 will not require one vITS per pITS. > > # Event Channels > > It has been proposed that it might be nice to inject event channels as > LPIs in the future. Whether or not that would involve any sort of vITS > is unclear, but if it did then it would likely be a separate emulation > to the vITS emulation used with a pITS and as such is not considered > further here. > > # Glossary > > * _MSI_: Message Signalled Interrupt > * _ITS_: Interrupt Translation Service > * _GIC_: Generic Interrupt Controller > * _LPI_: Locality-specific Peripheral Interrupt > > # References > > "GIC Architecture Specification" PRD03-GENC-010745 24.0. > > "IO Remapping Table System Software on ARMÂ Platforms" ARM DEN 0049A. > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.