|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table
On 24.11.2023 02:47, Marek Marczykowski-Górecki wrote:
> Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> on the same page as MSI-X table. Device model (especially one in
> stubdomain) cannot really handle those, as direct writes to that page is
> refused (page is on the mmio_ro_ranges list). Instead, extend
> msixtbl_mmio_ops to handle such accesses too.
>
> Doing this, requires correlating read/write location with guest
> of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> for PV would need to be done separately.
>
> This will be also used to read Pending Bit Array, if it lives on the same
> page, making QEMU not needing /dev/mem access at all (especially helpful
> with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> map it to the guest directly.
> If PBA lives on the same page, discard writes and log a message.
> Technically, writes outside of PBA could be allowed, but at this moment
> the precise location of PBA isn't saved, and also no known device abuses
> the spec in this way (at least yet).
>
> To access those registers, msixtbl_mmio_ops need the relevant page
> mapped. MSI handling already has infrastructure for that, using fixmap,
> so try to map first/last page of the MSI-X table (if necessary) and save
> their fixmap indexes. Note that msix_get_fixmap() does reference
> counting and reuses existing mapping, so just call it directly, even if
> the page was mapped before. Also, it uses a specific range of fixmap
> indexes which doesn't include 0, so use 0 as default ("not mapped")
> value - which simplifies code a bit.
>
> GCC gets confused about 'desc' variable:
>
> arch/x86/hvm/vmsi.c: In function ‘msixtbl_range’:
> arch/x86/hvm/vmsi.c:553:8: error: ‘desc’ may be used uninitialized
> [-Werror=maybe-uninitialized]
> 553 | if ( desc )
> | ^
> arch/x86/hvm/vmsi.c:537:28: note: ‘desc’ was declared here
> 537 | const struct msi_desc *desc;
> | ^~~~
This could do with also indicating the gcc version. Issues like this
tend to get fixed over time.
> It's conditional initialization is actually correct (in the case where
> it isn't initialized, function returns early), but to avoid
> build failure initialize it explicitly to NULL anyway.
>
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
Logic looks okay to me now (albeit I'm still not overly happy that we need
to gain such code), but there are a couple of cosmetic issues.
> @@ -213,6 +217,131 @@ static struct msi_desc *msixtbl_addr_to_desc(
> return NULL;
> }
>
> +/*
> + * Returns:
> + * - UINT_MAX if no handling should be done
> + * - UINT_MAX-1 if write should be discarded
> + * - a fixmap idx to use for handling
> + */
> +#define ADJACENT_DONT_HANDLE UINT_MAX
> +#define ADJACENT_DISCARD_WRITE (UINT_MAX - 1)
The comment would imo better talk in terms of the two constants you
define here.
> +static unsigned int adjacent_handle(
> + const struct msixtbl_entry *entry, unsigned long addr, bool write)
> +{
> + unsigned int adj_type;
> + const struct arch_msix *msix;
> +
> + if ( !entry || !entry->pdev )
> + return ADJACENT_DONT_HANDLE;
> +
> + if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable) && addr < entry->gtable )
> + adj_type = ADJ_IDX_FIRST;
> + else if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable + entry->table_len -
> 1) &&
> + addr >= entry->gtable + entry->table_len )
> + adj_type = ADJ_IDX_LAST;
> + else
> + return ADJACENT_DONT_HANDLE;
> +
> + msix = entry->pdev->msix;
> + ASSERT(msix);
> +
> + if ( !msix->adj_access_idx[adj_type] )
> + {
> + gprintk(XENLOG_WARNING,
> + "Page for adjacent(%d) MSI-X table access not initialized
> for %pp (addr %#lx, gtable %#lx\n",
> + adj_type, &entry->pdev->sbdf, addr, entry->gtable);
> +
> + return ADJACENT_DONT_HANDLE;
> + }
> +
> + /* If PBA lives on the same page too, discard writes. */
> + if ( write &&
> + ((adj_type == ADJ_IDX_LAST &&
> + msix->table.last == msix->pba.first) ||
> + (adj_type == ADJ_IDX_FIRST &&
> + msix->table.first == msix->pba.last)) )
> + {
> + gprintk(XENLOG_WARNING,
> + "MSI-X table and PBA of %pp live on the same page, "
> + "writing to other registers there is not implemented\n",
> + &entry->pdev->sbdf);
Here and above I think verbosity needs limiting to the first instance per
device per domain.
> + return ADJACENT_DISCARD_WRITE;
> + }
> +
> + return msix->adj_access_idx[adj_type];
> +}
> +
> +static int adjacent_read(
> + unsigned int fixmap_idx,
> + paddr_t address, unsigned int len, uint64_t *pval)
> +{
> + const void __iomem *hwaddr;
> +
> + *pval = ~0UL;
> +
> + ASSERT(fixmap_idx != ADJACENT_DISCARD_WRITE);
> +
> + hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> +
> + switch ( len )
> + {
> + case 1:
> + *pval = readb(hwaddr);
> + break;
> +
> + case 2:
> + *pval = readw(hwaddr);
> + break;
> +
> + case 4:
> + *pval = readl(hwaddr);
> + break;
> +
> + case 8:
> + *pval = readq(hwaddr);
> + break;
> +
> + default:
> + ASSERT_UNREACHABLE();
> + }
> + return X86EMUL_OKAY;
> +}
> +
> +static int adjacent_write(
> + unsigned int fixmap_idx,
> + uint64_t address, uint32_t len, uint64_t val)
This uses indentation different from the two cases further up. Types
used also don't match adjacent_read()'s.
> @@ -220,16 +349,31 @@ static int cf_check msixtbl_read(
> unsigned long offset;
> struct msixtbl_entry *entry;
> unsigned int nr_entry, index;
> + unsigned int adjacent_fixmap;
> int r = X86EMUL_UNHANDLEABLE;
>
> - if ( (len != 4 && len != 8) || (address & (len - 1)) )
> + if ( !IS_ALIGNED(address, len) )
> return r;
>
> rcu_read_lock(&msixtbl_rcu_lock);
> -
> entry = msixtbl_find_entry(current, address);
> if ( !entry )
> goto out;
> +
> + adjacent_fixmap = adjacent_handle(entry, address, false);
> + if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
> + {
> + r = adjacent_read(adjacent_fixmap, address, len, pval);
> + goto out;
> + }
> +
> + if ( address < entry->gtable ||
> + address >= entry->gtable + entry->table_len )
> + goto out;
> +
> + if ( len != 4 && len != 8 )
> + goto out;
> +
> offset = address & (PCI_MSIX_ENTRY_SIZE - 1);
>
> if ( offset != PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET )
> @@ -282,6 +426,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long
> address,
> int r = X86EMUL_UNHANDLEABLE;
> unsigned long flags;
> struct irq_desc *desc;
> + unsigned int adjacent_fixmap;
>
> if ( !IS_ALIGNED(address, len) )
> return X86EMUL_OKAY;
> @@ -291,6 +436,19 @@ static int msixtbl_write(struct vcpu *v, unsigned long
> address,
> entry = msixtbl_find_entry(v, address);
> if ( !entry )
> goto out;
> +
> + adjacent_fixmap = adjacent_handle(entry, address, true);
> + if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
> + {
> + r = adjacent_write(adjacent_fixmap, address, len, val);
> + goto out;
> + }
> + if ( address < entry->gtable ||
> + address >= entry->gtable + entry->table_len )
> + goto out;
> + if ( len != 4 && len != 8 )
> + goto out;
> +
Can this please follow the read side as far as use of blank lines goes?
> @@ -622,12 +788,15 @@ void msix_write_completion(struct vcpu *v)
> v->arch.hvm.hvm_io.msix_snoop_gpa )
> {
> unsigned int token = hvmemul_cache_disable(v);
> - const struct msi_desc *desc;
> + const struct msi_desc *desc = NULL;
> + const struct msixtbl_entry *entry;
> uint32_t data;
>
> rcu_read_lock(&msixtbl_rcu_lock);
> - desc = msixtbl_addr_to_desc(msixtbl_find_entry(v, snoop_addr),
> - snoop_addr);
> + entry = msixtbl_find_entry(v, snoop_addr);
> + if ( entry && snoop_addr >= entry->gtable &&
> + snoop_addr < entry->gtable + entry->table_len )
Nit: Too deep indentation.
> --- a/xen/arch/x86/include/asm/msi.h
> +++ b/xen/arch/x86/include/asm/msi.h
> @@ -207,6 +207,10 @@ struct msg_address {
> PCI_MSIX_ENTRY_SIZE + \
> (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1)))
>
> +/* indexes in adj_access_idx[] below */
Nit: Comment style again.
> @@ -1078,6 +1108,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
> WARN();
> msix->table.first = 0;
> msix->table.last = 0;
> + if ( msix->adj_access_idx[ADJ_IDX_FIRST] )
> + {
> + msix_put_fixmap(msix, msix->adj_access_idx[ADJ_IDX_FIRST]);
> + msix->adj_access_idx[ADJ_IDX_FIRST] = 0;
> + }
> + if ( msix->adj_access_idx[ADJ_IDX_LAST] )
> + {
> + msix_put_fixmap(msix, msix->adj_access_idx[ADJ_IDX_LAST]);
> + msix->adj_access_idx[ADJ_IDX_LAST] = 0;
> + }
>
> if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
> msix->pba.last) )
This could probably do with another blank line at the head of the
addition.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |