[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space



>>> Roger Pau Monne <roger.pau@xxxxxxxxxx> 06/30/17 5:01 PM >>>
> --- /dev/null
> +++ b/tools/tests/vpci/Makefile
> @@ -0,0 +1,40 @@
> +
> +XEN_ROOT=$(CURDIR)/../../..
> +include $(XEN_ROOT)/tools/Rules.mk
> +
> +TARGET := test_vpci
> +
> +.PHONY: all
> +all: $(TARGET)
> +
> +.PHONY: run
> +run: $(TARGET)
> +    ./$(TARGET) > $(TARGET).out

Is this a good way to run a test? Aiui it'll result in there not being
anything visible immediately; one has to go look at the produced file.
I'd suggest to leave it to the person invoking "make run" whether to
redirect output.

> +$(TARGET): vpci.c vpci.h list.h
> +    $(HOSTCC) -g -o $@ vpci.c main.c

If you compile main.c, why is there no dependency on it? And how about
emul.h?

> +.PHONY: clean
> +clean:
> +    rm -rf $(TARGET) $(TARGET).out *.o *~ vpci.h vpci.c list.h
> +
> +.PHONY: distclean
> +distclean: clean
> +
> +.PHONY: install
> +install:
> +
> +vpci.h: $(XEN_ROOT)/xen/include/xen/vpci.h
> +    sed -e '/#include/d' <$< >$@

Couldn't you combine this and list.h's rule into a pattern one?

> --- /dev/null
> +++ b/tools/tests/vpci/emul.h
> @@ -0,0 +1,117 @@
> +/*
> + * Unit tests for the generic vPCI handler code.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see 
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _TEST_VPCI_
> +#define _TEST_VPCI_
> +
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <errno.h>
> +#include <assert.h>
> +
> +#define container_of(ptr, type, member) ({                      \
> +        typeof(((type *)0)->member) *__mptr = (ptr);            \
> +        (type *)((char *)__mptr - offsetof(type, member));      \

I don't know what tools maintainers think about such name space
violations; in hypervisor code I'd ask you to avoid leading underscores
in macro local variables (same in min()/max() and elsewhere then).

> +/* Read a 32b register using all possible sizes. */
> +void multiread4(unsigned int reg, uint32_t val)
> +{
> +    unsigned int i;
> +
> +    /* Read using bytes. */
> +    for ( i = 0; i < 4; i++ )
> +        VPCI_READ_CHECK(reg + i, 1, (val >> (i * 8)) & UINT8_MAX);
> +
> +    /* Read using 2bytes. */
> +    for ( i = 0; i < 2; i++ )
> +        VPCI_READ_CHECK(reg + i * 2, 2, (val >> (i * 2 * 8)) & UINT16_MAX);
> +
> +    VPCI_READ_CHECK(reg, 4, val);
> +}
> +
> +void multiwrite4_check(unsigned int reg, uint32_t val)

Naming question again: Why the _check suffix here, but not on the read
function above?

> +{
> +    unsigned int i;
> +
> +    /* Write using bytes. */
> +    for ( i = 0; i < 4; i++ )
> +        VPCI_WRITE_CHECK(reg + i, 1, (val >> (i * 8)) & UINT8_MAX);
> +    multiread4(reg, val);
> +
> +    /* Write using 2bytes. */
> +    for ( i = 0; i < 2; i++ )
> +        VPCI_WRITE_CHECK(reg + i * 2, 2, (val >> (i * 2 * 8)) & UINT16_MAX);
> +    multiread4(reg, val);
> +
> +    VPCI_WRITE_CHECK(reg, 4, val);
> +    multiread4(reg, val);
> +}

Wouldn't it be better to vary the value written between the individual
sizes? Perhaps move the 32-bit write between the two loops, using ~val?
Otherwise you won't know whether what you read back is a result of the
writes you actually mean to test or earlier ones?

> +int
> +main(int argc, char **argv)
> +{
> +    /* Index storage by offset. */
> +    uint32_t r0 = 0xdeadbeef;
> +    uint8> +    uint16_t r20[2] = { 0 };

Just { } will suffice.

> +    uint32_t r24 = 0;
> +    uint8_t r28, r30;
> +    unsigned int i;
> +    int rc;
> +
> +    INIT_LIST_HEAD(&vpci.handlers);
> +
> +    VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
> +    VPCI_READ_CHECK(0, 4, 0xdeadbeef);

Why aren't you using r0 here?

> +    VPCI_WRITE_CHECK(0, 4, 0xbcbcbcbc);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 5, 1, r5);
> +    VPCI_READ_CHECK(5, 1, 0xef);
> +    VPCI_WRITE_CHECK(5, 1, 0xba);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 6, 1, r6);
> +    VPCI_READ_CHECK(6, 1, 0xbe);
> +    VPCI_WRITE_CHECK(6, 1, 0xba);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 7, 1, r7);
> +    VPCI_READ_CHECK(7, 1, 0xef);
> +    VPCI_WRITE_CHECK(7, 1, 0xbd);
> +
> +    VPCI_ADD_REG(vpci_read16, vpci_write16, 12, 2, r12);
> +    VPCI_READ_CHECK(12, 2, 0x8696);
> +    VPCI_READ_CHECK(12, 4, 0xffff8696);
> +
> +    /*
> +     * At this point we have the following layout:
> +     *
> +     * 32    24    16     8     0
> +     *  +-----+-----+-----+-----+
> +     *  |          r0           | 0
> +     *  +-----+-----+-----+-----+
> +     *  | r7  |  r6 |  r5 |/////| 32
> +     *  +-----+-----+-----+-----|

This is misleading (especially for readers of the code following this
comment), as you've written different values by now.

> +     *  |///////////////////////| 64
> +     *  +-----------+-----------+
> +     *  |///////////|    r12    | 96
> +     *  +-----------+-----------+
> +     *             ...
> +     *  / = empty.
> +     */
> +
> +    /* Try to add an overlapping register handler. */
> +    VPCI_ADD_INVALID_REG(vpci_read32, vpci_write32, 4, 4);
> +
> +    /* Try to add a non-aligned register. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 15, 2);
> +
> +    /* Try to add a register with wrong size. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 8, 3);
> +
> +    /* Try to add a register with missing handlers. */
> +    VPCI_ADD_INVALID_REG(NULL, NULL, 8, 2);
> +
> +    /* Read/write of unset register. */
> +    VPCI_READ_CHECK(8, 4, 0xffffffff);
> +    VPCI_READ_CHECK(8, 2, 0xffff);
> +    VPCI_READ_CHECK(8, 1, 0xff);
> +    VPCI_WRITE(10, 2, 0xbeef);
> +    VPCI_READ_CHECK(10, 2, 0xffff);
> +
> +    /* Read of multiple registers */
> +    VPCI_WRITE_CHECK(7, 1, 0xbd);
> +    VPCI_READ_CHECK(4, 4, 0xbdbabaff);
> +
> +    /* Partial read of a register. */
> +    VPCI_WRITE_CHECK(0, 4, 0x1a1b1c1d);
> +    VPCI_READ_CHECK(2, 1, 0x1b);
> +    VPCI_READ_CHECK(6, 2, 0xbdba);
> +
> +    /* Write of multiple registers. */
> +    VPCI_WRITE_CHECK(4, 4, 0xaabbccff);
> +
> +    /* Partial write of a register. */
> +    VPCI_WRITE_CHECK(2, 1, 0xfe);
> +    VPCI_WRITE_CHECK(6, 2, 0xfebc);
> +
> +    /*
> +     * Test all possible read/write size combinations.
> +     *
> +     * Populate 128bits (16B) with 1B registers, 160bits (20B) with 2B
> +     * registers, and finally 192bits (24B) with 4B registers.

I can't see how the numbers here are in line with the code this is
meant to describe. Perhaps this is a leftover from an earlier variant
of the code?

> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -41,6 +41,9 @@ SECTIONS
>  
>    . = ALIGN(PAGE_SIZE);
>    .rodata : {
> +       __start_vpci_array = .;
> +       *(.rodata.vpci)
> +       __end_vpci_array = .;

Do you really need this (unconditionally)?

> +static int vpci_access_check(unsigned int reg, unsigned int len)

The way you use it, this function want to return bool.

> +void hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
> +                         unsigned int *bus, unsigned int *slot,
> +                         unsigned int *func, unsigned int *reg)

Since you return nothing right now, how about avoid one of the
indirections? Best candidate would probably be the register value.

> +{
> +    unsigned long bdf;

Why long instead of int?

> +static bool vpci_portio_accept(const struct hv> +    return (p->addr == 
> 0xcf8 && p->size == 4) || (p->addr & 0xfffc) == 0xcfc;

Maybe better ~3 instead of 0xfffc (also likely to produce slightly
better code)?

> +static int vpci_portio_read(const struct hvm_io_handler *handler,
> +                            uint64_t addr, uint32_t size, uint64_t *data)
> +{
> +    struct domain *d = current->domain;
> +    unsigned int bus, slot, func, reg;
> +
> +    *data = ~(uint64_t)0;
> +
> +    vpci_lock(d);
> +    if ( addr == 0xcf8 )
> +    {
> +        ASSERT(size == 4);
> +        *data = d->arch.hvm_domain.pci_cf8;
> +        vpci_unlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +    if ( !CF8_ENABLED(d->arch.hvm_domain.pci_cf8) )
> +    {
> +        vpci_unlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    /* Decode the PCI address. */
> +    hvm_pci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &slot, &func,
> +                        ®);

With the function name I don't view a comment like the one here as very
useful.

> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -1178,18 +1178,16 @@ struct hvm_ioreq_server 
> *hvm_select_ioreq_server(struct domain *d,
>           CF8_ENABLED(cf8) )
>      {
>          uint32_t sbdf, x86_fam;
> +        unsigned int bus, slot, func, reg;
> +
> +        hvm_pci_decode_addr(cf8, p->addr, &bus, &slot, &func, ®);
>  
>          /* PCI config data cycle */
>  
> -        sbdf = XEN_DMOP_PCI_SBDF(0,
> -                                 PCI_BUS(CF8_BDF(cf8)),
> -                                 PCI_SLOT(CF8_BDF(cf8)),
> -                                 PCI_FUNC(CF8_BDF(cf8)));
> +        sbdf = XEN_DMOP_PCI_SBDF(0, bus, slot, func);
>  
>          type = XEN_DMOP_IO_RANGE_PCI;
> -        addr = ((uint64_t)sbdf << 32) |
> -               CF8_ADDR_LO(cf8) |
> -               (p->addr & 3);
> +        addr = ((uint64_t)sbdf << 32) | reg;
>          /* AMD extended configuration space access? */
>          if ( CF8_ADDR_HI(cf8) &&
>               d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&

This and the introduction of hvm_pci_decode_addr() would likely better
be broken out into a prereq patch, as this one is quite large even
without this effectively unrelated change.

> --- /dev/null
> +++ b/xen/drivers/vpci/vpci.c
> @@ -0,0 +1,405 @@
> +/*
> + * Generic functionality for handling accesses to the PCI configuration space
> + * from guests.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see 
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/vpci.h>
> +
> +extern const vpci_register_init_t __start_vpci_array[], __end_vpci_array[];
> +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
> +
> +/* Internal struct to store the emulated PCI registers. */
> +struct vpci_register {
> +    vpci_read_t *read;
> +    vpci_write_t *write;
> +    unsigned int size;
> +    unsigned int offset;
> +    void *private;
> +    struct list_head node;
> +};
> +
> +int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)

As pointed out in reply to an earlier version, this lacks a prereq
change: setup_one_hwdom_device() needs to be marked __hwdom_init. And
then, now that you have the annotation here, the placement of the
array in the linker script should depend on whether __hwdom_init is an
alias of __init.

> +int vpci_add_register(const struct pci_dev *pdev, vpci_read_t r> +           
>            unsigned int size, void *data)
> +{
> +    struct list_head *head;
> +    struct vpci_register *r;
> +
> +    /* Some sanity checks. */
> +    if ( (size != 1 && size != 2 && size != 4) ||
> +         offset >= PCI_CFG_SPACE_EXP_SIZE || offset & (size - 1) ||

Please add parens around the operands of &.

> +         (read_handler == NULL && write_handler == NULL) )

Please be consistent with NULL checks - as they're shorter, I'd suggest
to always use ...

> +        return -EINVAL;
> +
> +    r = xmalloc(struct vpci_register);
> +    if ( !r )

... this style.

> +        return -ENOMEM;
> +
> +    r->read = read_handler ?: vpci_ignored_read;
> +    r->write = write_handler ?: vpci_ignored_write;
> +    r->size = size;
> +    r->offset = offset;
> +    r->private = data;
> +
> +    vpci_lock(pdev->domain);
> +
> +    /* The list of handlers must be keep sorted at all times. */

kept

> +    list_for_each ( head, &pdev->vpci->handlers )

"head" is not a good name for something that doesn't always point at
the head of whatever list. How about "prev"?

> +int vpci_remove_register(const struct pci_dev *pdev, unsigned int offset,
> +                         unsigned int size)
> +{
> +    const struct vpci_register r = { .offset = offset, .size = size };
> +    struct vpci_register *rm = NULL;

Pointless initializer afaict (there's none on the equivalent variable
in the add function).

> +    vpci_lock(pdev->domain);
> +
> +    list_for_each_entry ( rm, &pdev->vpci->handlers, node )
> +        if ( vpci_register_cmp(&r, rm) <= 0 )
> +            break;
> +
> +    if ( !rm || rm->offset != offset || rm->size != size )

Obviously the !rm check here isn't needed then either, which points out
that you have a problem here: You don't properly handle the case of not
coming through the "break" path above, i.e. when rm points at the list
head (which isn't a full struct vpci_register).

> +static uint32_t vpci_read_hw(unsigned int seg, unsigned int bus,
> +                             unsigned int slot, unsigned int func,
> +                             unsigned int reg, uint32_t size)
> +{
> +    uint32_t data;
> +
> +    switch ( size )
> +    {
> +    case 4:
> +        data = pci_conf_read32(seg, bus, slot, func, reg);
> +        break;
> +    case 2:
> +        data = pci_conf_read16(seg, bus, slot, func, reg);
> +        break;
> +    case 1:
> +        data = pci_conf_read8(seg, bus, slot, func, reg);
> +        break;
> +    default:
> +        BUG();

As long as this is Dom0-only, BUG()s like this are probably fine, but
if this ever gets extended to DomU-s, will we really remember to
convert them?

> +/*
> + * Merge new data into a partial result.
> + *
> + * Zero the bytes of 'data' from [offset, offset + size), and
> + * merge the value found in 'new' from [0, offset) left shifted
> + * by 'offset'.
> + */
> +uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,

static?

> +                      unsigned int offset)
> +{
> +    uint32_t mask = ((uint64_t)1 << (size * 8)) - 1;

No need to use 64-bit arithmetic here: 0xffffffff >> (32 - 8 * size).

> +uint32_t vpci_read(unsigned int seg, unsigned int bus, unsigned int slot,
> +                   unsigned int func, unsigned int reg, uint32_t size)
> +{
> +    struct domain *d = current->domain;
> +    struct pci_dev *pdev;
> +    const struct vpci_register *r;
> +    unsigned int data_offset = 0;
> +    uint32_t data;
> +
> +    ASSERT(pcidevs_locked());
> +    ASSERT(vpci_locked(d));
> +
> +    /*
> +     * Read the hardware value.
> +     * NB: at the moment vPCI passthroughs everything (ie: permissive).

passes through

> +     */
> +    data = vpci_read_hw(seg, bus, slot, func, reg, size);

I continue to be worried of reads that have side effects here. Granted
we currently don't emulate any, but it would feel better if we didn't
do the read for no reason. I.e. do hw reads only to fill gaps between
emulated fields.

> +    /* Find the PCI dev matching the address. *> +    /* Replace any values 
> reported by the emulated registers. */
> +    list_for_each_entry ( r, &pdev->vpci->handlers, node )
> +    {
> +        const struct vpci_register emu = {
> +            .offset = reg + data_offset,
> +            .size = size - data_offset
> +        };
> +        int cmp = vpci_register_cmp(&emu, r);
> +        union vpci_val val = { .u32 = ~0 };
> +        unsigned int merge_size;
> +
> +        if ( cmp < 0 )
> +            break;
> +        if ( cmp > 0 )
> +            continue;
> +
> +        r->read(pdev, r->offset, &val, r->private);
> +
> +        /* Check if the read is in the middle of a register. */
> +        if ( r->offset < emu.offset )
> +            val.u32 >>= (emu.offset - r->offset) * 8;
> +
> +        data_offset = max(emu.offset, r->offset) - reg;
> +        /* Find the intersection size between the two sets. */
> +        merge_size = min(emu.offset + emu.size, r->offset + r->size) -
> +                     max(emu.offset, r->offset);
> +        /* Merge the emulated data into the native read value. */
> +        data = merge_result(data, val.u32, merge_size, data_offset);
> +        data_offset += merge_size;
> +        if ( data_offset == size )
> +            break;

ASSERT(data_offset < size) ?

> --- /dev/null
> +++ b/xen/include/xen/vpci.h
> @@ -0,0 +1,79 @@
> +#ifndef _VPCI_
> +#define _VPCI_
> +
> +#include <xen/pci.h>
> +#include <xen/types.h>
> +#include <xen/list.h>
> +
> +/*
> + * Helpers for locking/unlocking.
> + *
> + * NB: the recursive variants are used so that spin_is_locked
> + * returns whether the lock is hold by the current CPU (instead
> + * of just returning whether the lock is hold by any CPU).
> + */
> +#define vpci_lock(d) spin_lock_recursive(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_unlock(d) spin_unlock_recursive(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)
> +
> +/* Value read or written by the handlers. */
> +union vpci_val {
> +    uint8_t u8;
> +    uint16_t u16;
> +    uint32_t u32;
> +};

I continue to be unconvinced that this union is a good way to handle
different sizes. Afaict Coverity (or similar tools) may recognize quite
a few possible uses of uninitialized data. Quite likely all of them
would be false positives, but anyway. Would it really be a big problem
to uniformly pass uint32_t values around?

> +/*
> + * The vPCI handlers will never be called concurrently for the same domain, 
> ii
> + * is guaranteed that the vpci domain lock will always be locked when calling
> + * any handler.
> + */
> +typedef void (vpci_read_t)(struct pci_dev *pdev, unsigned int reg,
> +                           union vpci_val *val, void *data);
> +
> +typedef void (vpci_write_t)(struct pci_dev *pdev, unsigned int reg,
> +                            union vpci_val val, void *data);

Stray parentheses around the type name being defined.

> +typedef int (*vpci_register_init_t)(struct pci_dev *dev);

This one is inconsistent with the other two in that it defines a
pointer type.

> +#define REGISTER_VPCI_INIT(x)                   \
> +  static const vpci_register_init_t x##_entry   \
> +               __used_section(".rodata.vpci") = x
> +
> +/* Add vPCI handlers to device. */
> +int __must_check vpci_add_handlers(struct pci_dev *dev);
> +
> +/* Add/remove a register handler. */
> +int __must_check vpci_add_register(const struct pci_dev *pdev,
> +                                   vpci_read_t read_handler,
> +                                   vpci_write_t write_handler,

I'm surprised this compiles without (at least) warnings - you appear to
be lacking *s here.

> +                                   unsigned int offset,
> +                                   unsigned int size, void *data);
> +int __must_check vpci_remove_register(const struct pci_dev *pdev,
> +                                      unsigned int offset,
> +                                      unsigned int size);
> > +uint32_t vpci_read(unsigned int seg, unsigned int bus, unsigned int slot,
> +                   unsigned int func, unsigned int reg, uint32_t size);
> +void vpci_write(unsigned int seg, unsigned int bus, unsigned int slot,
> +                unsigned int func, unsigned int reg, uint32_t size,
> +                uint32_t data);

I don't see why size needs to be of a fixed width type in both of these.

> +struct vpci {
> +    /* Root pointer for the tree of vPCI handlers. */
> +    struct list_head handlers;

The comment says "tree", but right now this really is just a list.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.