Xen project Mailing List

Re: [Xen-devel] [PATCH v5 4/6] xen/arm: vgic: Optimize the way to store the target vCPU in the rank

To: Julien Grall <julien.grall@xxxxxxxxxx>

From: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>

Date: Fri, 13 Nov 2015 15:44:14 +0000

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, ian.campbell@xxxxxxxxxx, stefano.stabellini@xxxxxxxxxxxxx

Delivery-date: Fri, 13 Nov 2015 15:44:32 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, 9 Nov 2015, Julien Grall wrote: > Xen is currently directly storing the value of GICD_ITARGETSR register > (for GICv2) and GICD_IROUTER (for GICv3) in the rank. This makes the > emulation of the registers access very simple but makes the code to get > the target vCPU for a given vIRQ more complex. > > While the target vCPU of an vIRQ is retrieved every time an vIRQ is > injected to the guest, the access to the register occurs less often. > > So the data structure should be optimized for the most common case > rather than the inverse. > > This patch introduces the usage of an array to store the target vCPU for > every interrupt in the rank. This will make the code to get the target > very quick. The emulation code will now have to generate the GICD_ITARGETSR > and GICD_IROUTER register for read access and split it to store in a > convenient way. > > With the new way to store the target vCPU, the structure vgic_irq_rank > is shrunk down from 320 bytes to 92 bytes. This is saving about 228 > bytes of memory allocated separately per vCPU. nice > Note that with these changes, any read to those register will list only > the target vCPU used by Xen. As the spec is not clear whether this is a > valid choice or not, OSes which have a different interpretation of the > spec (i.e OSes which perform read-modify-write operations on these > registers) may not boot anymore on Xen. Although, I think this is fair > trade between memory usage in Xen (1KB less on a domain using 4 vCPUs > with no SPIs) and a strict interpretation of the spec (though all the > cases are not clearly defined). > > Furthermore, the implementation of the callback get_target_vcpu is now > exactly the same. Consolidate the implementation in the common vGIC code > and drop the callback. > > Finally take the opportunity to fix coding style and replace "irq" by > "virq" to make clear that we are dealing with virtual IRQ in section we > are modifying. > > Signed-off-by: Julien Grall <julien.grall@xxxxxxxxxx> > > --- > The real reason is I'd like to drop vgic_byte_* helpers in favor of more > generic access helpers. Although it would make the code to retrieve the > priority more complex. So reworking the code to get the target vCPU > was the best solution. > > Changes in v5: > - NR_TARGET_PER_REG has been renamed to NR_TARGETS_PER_ITARGETSR > - NR_BIT_PER_TARGET has been renamed to NR_BITS_PER_TARGET > - Fix comments > - s/NR_BYTE_PER_IROUTER/NR_BYTES_PER_IROUTER/ > - Remove the duplicate declaration of virq in vgic_store_itargetsr > > Changes in v4: > - Move the behavior changes in 2 separate patches: > * patch #1: Handle correctly byte write in ITARGETSR > * patch #2: Don't ignore a write in ITARGETSR if one field is 0 > - Update commit message > > Changes in v3: > - Make clear in the commit message that we rely on a corner case > of the spec. > - Typoes > > Changes in v2: > - Remove get_target_vcpu callback > - s/irq/virq/ to make clear we are using virtual IRQ > - Update the commit message > - Rename vgic_generate_* to vgic_fetch > - Improve code comment > - Move the introduction of vgic_rank_init in a separate patch > - Make use of rank->idx > > Changes in v1: > - Patch added > --- > xen/arch/arm/vgic-v2.c | 86 +++++++++++++------------------- > xen/arch/arm/vgic-v3.c | 119 > ++++++++++++++++++++++++--------------------- > xen/arch/arm/vgic.c | 45 ++++++++++++----- > xen/include/asm-arm/vgic.h | 18 +++---- > 4 files changed, 137 insertions(+), 131 deletions(-) > > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c > index b5249ff..4f976d4 100644 > --- a/xen/arch/arm/vgic-v3.c > +++ b/xen/arch/arm/vgic-v3.c > @@ -89,18 +89,72 @@ static struct vcpu *vgic_v3_irouter_to_vcpu(struct domain > *d, uint64_t irouter) > return d->vcpu[vcpu_id]; > } > > -static struct vcpu *vgic_v3_get_target_vcpu(struct vcpu *v, unsigned int irq) > -{ > - struct vcpu *v_target; > - struct vgic_irq_rank *rank = vgic_rank_irq(v, irq); > +#define NR_BYTES_PER_IROUTER 8U Given the way this constant is used, it might be better to define it as a bit shift of 3, removing the two division operations below. > +/* > + * Fetch an IROUTER register based on the offset from IROUTER0. Only one > + * vCPU will be listed for a given vIRQ. > + * > + * Note the offset will be aligned to the appropriate boundary. > + */ > +static uint64_t vgic_fetch_irouter(struct vgic_irq_rank *rank, > + unsigned int offset) > +{ > ASSERT(spin_is_locked(&rank->lock)); > > - v_target = vgic_v3_irouter_to_vcpu(v->domain, rank->v3.irouter[irq % > 32]); > + /* There is exactly 1 vIRQ per IROUTER */ > + offset /= NR_BYTES_PER_IROUTER; > > - ASSERT(v_target != NULL); > + /* Get the index in the rank */ > + offset &= INTERRUPT_RANK_MASK; > > - return v_target; > + return vcpuid_to_vaffinity(rank->vcpu[offset]); > +} > + > +/* > + * Store an IROUTER register in a convenient way and migrate the vIRQ > + * if necessary. This function only deals with IROUTER32 and onwards. > + * > + * Note the offset will be aligned to the appropriate boundary. > + */ > +static void vgic_store_irouter(struct domain *d, struct vgic_irq_rank *rank, > + unsigned int offset, uint64_t irouter) > +{ > + struct vcpu *new_vcpu, *old_vcpu; > + unsigned int virq; > + > + /* There is 1 vIRQ per IROUTER */ > + virq = offset / NR_BYTES_PER_IROUTER; > + > + /* > + * The IROUTER0-31, used for SGIs/PPIs, are reserved and should > + * never call this function. > + */ > + ASSERT(virq >= 32); > + > + /* Get the index in the rank */ > + offset &= virq & INTERRUPT_RANK_MASK; > + > + new_vcpu = vgic_v3_irouter_to_vcpu(d, irouter); > + old_vcpu = d->vcpu[rank->vcpu[offset]]; > + > + /* > + * From the spec (see 8.9.13 in IHI 0069A), any write with an > + * invalid vCPU will lead to the interrupt being ignored. > + * > + * But the current code to inject an IRQ is not able to cope with > + * invalid vCPU. So for now, just ignore the write. > + * > + * TODO: Respect the spec > + */ > + if ( !new_vcpu ) > + return; > + > + /* Only migrate the IRQ if the target vCPU has changed */ > + if ( new_vcpu != old_vcpu ) > + vgic_migrate_irq(old_vcpu, new_vcpu, virq); > + > + rank->vcpu[offset] = new_vcpu->vcpu_id; > } > > static int __vgic_v3_rdistr_rd_mmio_read(struct vcpu *v, mmio_info_t *info, > @@ -726,8 +780,7 @@ static int vgic_v3_distr_mmio_read(struct vcpu *v, > mmio_info_t *info, > DABT_DOUBLE_WORD); > if ( rank == NULL ) goto read_as_zero; > vgic_lock_rank(v, rank, flags); > - *r = rank->v3.irouter[REG_RANK_INDEX(64, > - (gicd_reg - GICD_IROUTER), DABT_DOUBLE_WORD)]; > + *r = vgic_fetch_irouter(rank, gicd_reg - GICD_IROUTER); > vgic_unlock_rank(v, rank, flags); > return 1; > case GICD_NSACR ... GICD_NSACRN: > @@ -812,8 +865,6 @@ static int vgic_v3_distr_mmio_write(struct vcpu *v, > mmio_info_t *info, > struct hsr_dabt dabt = info->dabt; > struct vgic_irq_rank *rank; > unsigned long flags; > - uint64_t new_irouter, old_irouter; > - struct vcpu *old_vcpu, *new_vcpu; > int gicd_reg = (int)(info->gpa - v->domain->arch.vgic.dbase); > > perfc_incr(vgicd_writes); > @@ -881,32 +932,8 @@ static int vgic_v3_distr_mmio_write(struct vcpu *v, > mmio_info_t *info, > rank = vgic_rank_offset(v, 64, gicd_reg - GICD_IROUTER, > DABT_DOUBLE_WORD); > if ( rank == NULL ) goto write_ignore; > - new_irouter = r; > vgic_lock_rank(v, rank, flags); > - > - old_irouter = rank->v3.irouter[REG_RANK_INDEX(64, > - (gicd_reg - GICD_IROUTER), > - DABT_DOUBLE_WORD)]; > - old_vcpu = vgic_v3_irouter_to_vcpu(v->domain, old_irouter); > - new_vcpu = vgic_v3_irouter_to_vcpu(v->domain, new_irouter); > - > - if ( !new_vcpu ) > - { > - printk(XENLOG_G_DEBUG > - "%pv: vGICD: wrong irouter at offset %#08x val > %#"PRIregister, > - v, gicd_reg, r); > - vgic_unlock_rank(v, rank, flags); > - /* > - * TODO: Don't inject a fault to the guest when the MPIDR is > - * not valid. From the spec, the interrupt should be > - * ignored. > - */ > - return 0; > - } > - rank->v3.irouter[REG_RANK_INDEX(64, (gicd_reg - GICD_IROUTER), > - DABT_DOUBLE_WORD)] = new_irouter; > - if ( old_vcpu != new_vcpu ) > - vgic_migrate_irq(old_vcpu, new_vcpu, (gicd_reg - > GICD_IROUTER)/8); > + vgic_store_irouter(v->domain, rank, gicd_reg - GICD_IROUTER, r); > vgic_unlock_rank(v, rank, flags); > return 1; > case GICD_NSACR ... GICD_NSACRN: > @@ -1036,7 +1063,6 @@ static const struct mmio_handler_ops > vgic_distr_mmio_handler = { > static int vgic_v3_vcpu_init(struct vcpu *v) > { > int i; > - uint64_t affinity; > paddr_t rdist_base; > struct vgic_rdist_region *region; > unsigned int last_cpu; > @@ -1045,15 +1071,6 @@ static int vgic_v3_vcpu_init(struct vcpu *v) > struct domain *d = v->domain; > uint32_t rdist_stride = d->arch.vgic.rdist_stride; > > - /* For SGI and PPI the target is always this CPU */ > - affinity = (MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 3) << 32 | > - MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 2) << 16 | > - MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 1) << 8 | > - MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 0)); > - > - for ( i = 0 ; i < 32 ; i++ ) > - v->arch.vgic.private_irqs->v3.irouter[i] = affinity; > - > /* > * Find the region where the re-distributor lives. For this purpose, > * we look one region ahead as we have only the first CPU in hand. > @@ -1098,7 +1115,7 @@ static int vgic_v3_vcpu_init(struct vcpu *v) > > static int vgic_v3_domain_init(struct domain *d) > { > - int i, idx; > + int i; > > /* > * Domain 0 gets the hardware address. > @@ -1150,13 +1167,6 @@ static int vgic_v3_domain_init(struct domain *d) > d->arch.vgic.rdist_regions[0].first_cpu = 0; > } > > - /* By default deliver to CPU0 */ > - for ( i = 0; i < DOMAIN_NR_RANKS(d); i++ ) > - { > - for ( idx = 0; idx < 32; idx++ ) > - d->arch.vgic.shared_irqs[i].v3.irouter[idx] = 0; > - } > - > /* Register mmio handle for the Distributor */ > register_mmio_handler(d, &vgic_distr_mmio_handler, d->arch.vgic.dbase, > SZ_64K, NULL); > @@ -1182,7 +1192,6 @@ static int vgic_v3_domain_init(struct domain *d) > static const struct vgic_ops v3_ops = { > .vcpu_init = vgic_v3_vcpu_init, > .domain_init = vgic_v3_domain_init, > - .get_target_vcpu = vgic_v3_get_target_vcpu, > .emulate_sysreg = vgic_v3_emulate_sysreg, > /* > * We use both AFF1 and AFF0 in (v)MPIDR. Thus, the max number of CPU _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.