[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space



>>> On 27.04.17 at 16:35, <roger.pau@xxxxxxxxxx> wrote:
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -11,7 +11,7 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>      if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM) {
>          if (d_config->b_info.device_model_version !=
>              LIBXL_DEVICE_MODEL_VERSION_NONE) {
> -            xc_config->emulation_flags = XEN_X86_EMU_ALL;
> +            xc_config->emulation_flags = (XEN_X86_EMU_ALL & 
> ~XEN_X86_EMU_VPCI);

I can see why you need this, but I'm not sure this is a good model.
Ideally for ordinary HVM guests you'd never have to change this
line. Therefore perhaps it might be a better idea to use a "negative"
flag here.

> --- /dev/null
> +++ b/tools/tests/vpci/Makefile
> @@ -0,0 +1,45 @@
> +
> +XEN_ROOT=$(CURDIR)/../../..
> +include $(XEN_ROOT)/tools/Rules.mk
> +
> +TARGET := test_vpci
> +
> +.PHONY: all
> +all: $(TARGET)
> +
> +.PHONY: run
> +run: $(TARGET)
> +     ./$(TARGET) > $(TARGET).out
> +
> +$(TARGET): vpci.c vpci.h rbtree.c rbtree.h
> +     $(HOSTCC) -g -o $@ vpci.c main.c rbtree.c
> +
> +.PHONY: clean
> +clean:
> +     rm -rf $(TARGET) $(TARGET).out *.o *~ vpci.h vpci.c rbtree.c rbtree.h
> +
> +.PHONY: distclean
> +distclean: clean
> +
> +.PHONY: install
> +install:
> +
> +vpci.h: $(XEN_ROOT)/xen/include/xen/vpci.h
> +     sed -e '/#include/d' <$< >$@
> +
> +vpci.c: $(XEN_ROOT)/xen/drivers/vpci/vpci.c
> +     # Trick the compiler so it doesn't complain about missing symbols
> +     sed -e '/#include/d' \
> +         -e '1s;^;#include "emul.h"\
> +                  const vpci_register_init_t __start_vpci_array[1]\;\
> +                  const vpci_register_init_t __end_vpci_array[1]\;\
> +                  ;' <$< >$@
> +
> +rbtree.h: $(XEN_ROOT)/xen/include/xen/rbtree.h
> +     sed -e '/#include/d' <$< >$@
> +
> +rbtree.c: $(XEN_ROOT)/xen/common/rbtree.c
> +     sed -e "/#include/d" \
> +         -e '1s;^;#include "emul.h"\
> +                  ;' <$< >$@

Plain symlinking and __XEN__ conditionals in the files may be the
easier to follow variant. But I'm no heavily opposed to this one,
I'm merely afraid that further adjustments may end up becoming
necessary down the road, resulting in the rules here to become
more convoluted.

> --- /dev/null
> +++ b/tools/tests/vpci/emul.h
> @@ -0,0 +1,107 @@
> +/*
> + * Unit tests for the generic vPCI handler code.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see 
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _TEST_VPCI_
> +#define _TEST_VPCI_
> +
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <errno.h>
> +#include <assert.h>
> +
> +#define container_of(ptr, type, member) ({                      \
> +        typeof( ((type *)0)->member ) *__mptr = (ptr);          \
> +        (type *)( (char *)__mptr - offsetof(type,member) );})

There are a couple of stray blanks (immediately inside parentheses)
here, and a missing one after the comma in offsetof().

> +#include "rbtree.h"
> +
> +struct pci_dev {
> +    struct domain *domain;
> +    struct vpci *vpci;
> +};
> +
> +struct domain {
> +    struct pci_dev pdev;
> +};
> +
> +struct vcpu
> +{
> +    struct domain *domain;
> +};
> +
> +extern struct vcpu v;

This is odd. With ...

> +#define spin_lock(x)
> +#define spin_unlock(x)
> +#define spin_is_locked(x) true
> +
> +#define current (&v)

... this, why don't you simply have

extern struct vcpu *current;

keeping v (or however you mean to name it) static?

> +#define has_vpci(d) true
> +
> +#include "vpci.h"
> +
> +#define xzalloc(type) (type *)calloc(1, sizeof(type))

Missing an outer pair of parentheses.

> +#define xfree(p) free(p)
> +
> +#define EXPORT_SYMBOL(x)

I think we should rather get rid of them from rbtree.c.

> +#define pci_get_pdev_by_domain(d, ...) &(d)->pdev

Missing an outer pair of parentheses again, whereas ...

> +#define atomic_read(x) 1
> +
> +/* Dummy native helpers. Writes are ignored, reads return 1's. */
> +#define pci_conf_read8(...) (0xff)
> +#define pci_conf_read16(...) (0xffff)
> +#define pci_conf_read32(...) (0xffffffff)

... here they're pointless.

> +/* Dummy hooks, write stores data, read fetches it. */
> +static int vpci_read8(struct pci_dev *pdev, unsigned int reg,
> +                      union vpci_val *val, void *data)
> +{
> +    uint8_t *priv = data;
> +
> +    val->half_word = *priv;

Half word? Half a word is at least 16 bits on any reasonable
architecture nowadays. Using it for a byte is simply confusing. I'd
suggest naming the fields what they are - u8, u16, and u32.

> +#define VPCI_READ(reg, size, data) \
> +    assert(!xen_vpci_read(0, 0, 0, reg, size, data))
> +
> +#define VPCI_READ_CHECK(reg, size, expected) ({ \
> +    uint32_t val;                               \
> +    VPCI_READ(reg, size, &val);                 \
> +    assert(val == expected);                    \
> +    })

Bad indentation - either the }) needs to move left, of the body needs
to move right.

> +#define VPCI_WRITE(reg, size, data) \
> +    assert(!xen_vpci_write(0, 0, 0, reg, size, data))

You using fixed SBDF here, ...

> +#define VPCI_CHECK_REG(reg, size, data) ({      \
> +    VPCI_WRITE(reg, size, data);                \
> +    VPCI_READ_CHECK(reg, size, data);           \
> +    })
> +
> +#define VPCI_ADD_REG(fread, fwrite, off, size, store)                        
>  \
> +    assert(!xen_vpci_add_register(&d.pdev, fread, fwrite, off, size, 
> &store)) \

... why do you have this strange &d.pdev here? The assumption
that a (fake) domain has a single (fake) PCI device looks pretty odd
anyway - why can't you simply have a global (fake) PCI device?

> +#define VPCI_ADD_INVALID_REG(fread, fwrite, off, size)                      \
> +    assert(xen_vpci_add_register(&d.pdev, fread, fwrite, off, size, NULL))  \
> +
> +int
> +main(int argc, char **argv)
> +{
> +    /* Index storage by offset. */
> +    uint32_t r0 = 0xdeadbeef;
> +    uint8_t r5 = 0xef;
> +    uint8_t r6 = 0xbe;
> +    uint8_t r7 = 0xef;
> +    uint16_t r12 = 0x8696;
> +    int rc;
> +
> +    VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
> +    VPCI_READ_CHECK(0, 4, 0xdeadbeef);
> +    VPCI_CHECK_REG(0, 4, 0xbcbcbcbc);

In the context here the macro name is pretty confusing: I'd expect
it to check the register holds the specified value, without also doing
a write. How about VPCI_WRITE_CHECK()?

> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 5, 1, r5);
> +    VPCI_READ_CHECK(5, 1, 0xef);
> +    VPCI_CHECK_REG(5, 1, 0xba);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 6, 1, r6);
> +    VPCI_READ_CHECK(6, 1, 0xbe);
> +    VPCI_CHECK_REG(6, 1, 0xba);
> +
> +    VPCI_ADD_REG(vpci_read8, vpci_write8, 7, 1, r7);
> +    VPCI_READ_CHECK(7, 1, 0xef);
> +    VPCI_CHECK_REG(7, 1, 0xbd);
> +
> +    VPCI_ADD_REG(vpci_read16, vpci_write16, 12, 2, r12);
> +    VPCI_READ_CHECK(12, 2, 0x8696);
> +    VPCI_READ_CHECK(12, 4, 0xffff8696);
> +
> +    /*
> +     * At this point we have the following layout:
> +     *
> +     * 32    24    16     8     0
> +     *  +-----+-----+-----+-----+
> +     *  |          r0           | 0
> +     *  +-----+-----+-----+-----+
> +     *  | r7  |  r6 |  r5 |/////| 32
> +     *  +-----+-----+-----+-----|
> +     *  |///////////////////////| 64
> +     *  +-----------+-----------+
> +     *  |///////////|    r12    | 96
> +     *  +-----------+-----------+
> +     *             ...
> +     *  / = empty.
> +     */
> +
> +    /* Try to add an overlapping register handler. */
> +    VPCI_ADD_INVALID_REG(vpci_read32, vpci_write32, 4, 4);
> +
> +    /* Try to add a non-aligned register. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 15, 2);
> +
> +    /* Try to add a register with wrong size. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, vpci_write16, 8, 3);
> +
> +    /* Try to add a register with missing handlers. */
> +    VPCI_ADD_INVALID_REG(vpci_read16, NULL, 8, 2);
> +    VPCI_ADD_INVALID_REG(NULL, vpci_write16, 8, 2);

Is that something which really is wrong in all cases? What about e.g.
r/o registers?

> +    /* Read/write of unset register. */
> +    VPCI_READ_CHECK(8, 4, 0xffffffff);
> +    VPCI_READ_CHECK(8, 2, 0xffff);
> +    VPCI_READ_CHECK(8, 1, 0xff);
> +    VPCI_WRITE(10, 2, 0xbeef);
> +    VPCI_READ_CHECK(10, 2, 0xffff);
> +
> +    /* Read of multiple registers */
> +    VPCI_CHECK_REG(7, 1, 0xbd);
> +    VPCI_READ_CHECK(4, 4, 0xbdbabaff);

I think a variant accessing mixed size registers would also be
desirable here. Perhaps it would be best to exhaustively test
all possible variations (there aren't that many after all). Same
for writes and partial accesses (below) then.

> @@ -256,6 +257,152 @@ void register_g2m_portio_handler(struct domain *d)
>      handler->ops = &g2m_portio_ops;
>  }
>  
> +/* Do some sanity checks. */
> +static int vpci_access_check(unsigned int reg, unsigned int len)
> +{
> +    /* Check access size. */
> +    if ( len != 1 && len != 2 && len != 4 )
> +    {
> +        gdprintk(XENLOG_WARNING, "invalid length (reg: %#x, len: %u)\n",
> +                 reg, len);

I think many of such gdprintk()s want to go away before this series
gets committed.

> +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
> +static bool_t vpci_portio_accept(const struct hvm_io_handler *handler,

Plain bool please.

> +                                 const ioreq_t *p)
> +{
> +    return (p->addr == 0xcf8 && p->size == 4) || (p->addr & 0xfffc) == 0xcfc;
> +}
> +
> +static int vpci_portio_read(const struct hvm_io_handler *handler,
> +                            uint64_t addr, uint32_t size, uint64_t *data)
> +{
> +    struct domain *d = current->domain;
> +    unsigned int bus, devfn, reg;
> +    uint32_t data32;
> +    int rc;
> +
> +    vpci_lock(d);
> +    if ( addr == 0xcf8 )
> +    {
> +        ASSERT(size == 4);
> +        *data = d->arch.hvm_domain.pci_cf8;
> +        vpci_unlock(d);
> +        return X86EMUL_OKAY;
> +    }
> +    else if ( !CF8_ENABLED(d->arch.hvm_domain.pci_cf8) )

Pointless "else".

> +    {
> +        vpci_unlock(d);
> +        return X86EMUL_OKAY;

You need to write to *data here, or else you need to return
false from vpci_portio_accept() already in this case (but then
you'd need to follow the stdvga model and take the lock
there, releasing it in a .complete handler).

> +    }
> +
> +    /* Decode the PCI address. */
> +    hvm_pci_decode_addr(d->arch.hvm_domain.pci_cf8, addr, &bus, &devfn, 
> &reg);
> +
> +    if ( vpci_access_check(reg, size) || reg >= 0xff )

> 0xff or >= 0x100, but the check is pointless as
hvm_pci_decode_addr() wont return larger values.

> +    {
> +        vpci_unlock(d);
> +        return X86EMUL_UNHANDLEABLE;

I don't think this matches real hardware behavior. If this "fails"
at all, surely by returning all ones.

> +    }
> +
> +    rc = xen_vpci_read(0, bus, devfn, reg, size, &data32);
> +    if ( !rc )
> +        *data = data32;
> +    vpci_unlock(d);

Please set *data outside the locked region.

And since there's no best place to make this other remark - I'd
prefer if you either kept together SBDF in one value when passing
this as arguments to functions, or alternatively pass this as four
values rather than keeping devfn artificially together.

> +     return rc ? X86EMUL_UNHANDLEABLE : X86EMUL_OKAY;
> +}

Again the question - what's the bare hardware equivalent of
returning X86EMUL_UNHANDLEABLE here?

> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -1177,6 +1177,9 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct 
> domain *d,
>           CF8_ENABLED(cf8) )
>      {
>          uint32_t sbdf, x86_fam;
> +        unsigned int bus, devfn, reg;
> +
> +        hvm_pci_decode_addr(cf8, p->addr, &bus, &devfn, &reg);
>  
>          /* PCI config data cycle */
>  
> @@ -1186,9 +1189,7 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct 
> domain *d,
>                                   PCI_FUNC(CF8_BDF(cf8)));

Any reason you don't use bus and devfn (really dev/slot and func)
in the expression the tail of which is visible here?

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -224,6 +224,9 @@ SECTIONS
>         __start_schedulers_array = .;
>         *(.data.schedulers)
>         __end_schedulers_array = .;
> +       __start_vpci_array = .;
> +       *(.data.vpci)
> +       __end_vpci_array = .;

With vpci.c declaring these const, they should go into .rodata.
With the type name further being vpci_register_init_t it may even
be next to .init.rodata where they belong.

> @@ -1041,6 +1042,8 @@ static void setup_one_hwdom_device(const struct 
> setup_hwdom *ctxt,
>          devfn += pdev->phantom_stride;
>      } while ( devfn != pdev->devfn &&
>                PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
> +
> +    xen_vpci_add_handlers(pdev);
> }

You're losing an error code here.

> --- /dev/null
> +++ b/xen/drivers/vpci/Makefile
> @@ -0,0 +1 @@
> +obj-y += vpci.o

Without having seen further patches it's not clear whether this really
needs its own directory.

> --- /dev/null
> +++ b/xen/drivers/vpci/vpci.c
> @@ -0,0 +1,469 @@
> +/*
> + * Generic functionality for handling accesses to the PCI configuration space
> + * from guests.
> + *
> + * Copyright (C) 2017 Citrix Systems R&D
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see 
> <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/vpci.h>
> +
> +extern const vpci_register_init_t __start_vpci_array[], __end_vpci_array[];
> +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
> +#define vpci_init __start_vpci_array

What is this last one good for?

> +/* Internal struct to store the emulated PCI registers. */
> +struct vpci_register {
> +    vpci_read_t read;
> +    vpci_write_t write;

These two are pointers - please change the typedefs so that they're
visibly pointers here. That'll then also allow the typedef to be used to
declare actual handlers, should any such declarations be needed (e.g.
if the same handler can be used by two different source files).

> +    unsigned int size;
> +    unsigned int offset;
> +    void *priv_data;

"private" (shorter and hence easier to type)?

> +    struct rb_node node;
> +};
> +
> +int xen_vpci_add_handlers(struct pci_dev *pdev)

__hwdom_init (I notice setup_one_hwdom_device() wrongly isn't
annotated so).

> +{
> +    int i, rc = 0;

i wants to be unsigned.

> +    if ( !has_vpci(pdev->domain) )
> +        return 0;
> +
> +    pdev->vpci = xzalloc(struct vpci);
> +    if ( !pdev->vpci )
> +        return -ENOMEM;
> +
> +    pdev->vpci->handlers = RB_ROOT;
> +
> +    for ( i = 0; i < NUM_VPCI_INIT; i++ )
> +    {
> +        rc = vpci_init[i](pdev);
> +        if ( rc )
> +            break;
> +    }
> +
> +    if ( rc )
> +    {
> +        struct rb_node *node = rb_first(&pdev->vpci->handlers);
> +        struct vpci_register *r;

Please move this into the more narrow scope below.

> +        /* Iterate over the tree and cleanup. */
> +        while ( node != NULL )
> +        {
> +            r = container_of(node, struct vpci_register, node);
> +            node = rb_next(node);
> +            rb_erase(&r->node, &pdev->vpci->handlers);
> +            xfree(r);
> +        }
> +        xfree(pdev->vpci);
> +    }
> +
> +    return rc;
> +}
> +
> +static bool vpci_register_overlap(const struct vpci_register *r,
> +                                  unsigned int offset)
> +{
> +    if ( offset >= r->offset && offset < r->offset + r->size )
> +        return true;
> +
> +    return false;

This can be one single return statement.

> +}
> +
> +

Stray double blank lines.

> +static int vpci_register_cmp(const struct vpci_register *r1,
> +                             const struct vpci_register *r2)
> +{
> +    /* Make sure there's no overlap between registers. */
> +    if ( vpci_register_overlap(r1, r2->offset) ||
> +         vpci_register_overlap(r1, r2->offset + r2->size - 1) ||
> +         vpci_register_overlap(r2, r1->offset) ||
> +         vpci_register_overlap(r2, r1->offset + r1->size - 1) )

Overlap checks can generally be done with just two comparisons,
so I guess the parameters chosen for vpci_register_overlap()
aren't optimal. I guess you don't need the function at all, as you
could do all that's needed here:

    if ( r1->offset < r2->offset + r2->size &&
         r2->offset < r1->offset + r1->size )
        return 0;

The comment of course is somewhat misleading here too, as
returning zero isn't really an error indication.

> +        return 0;
> +
> +    if (r1->offset < r2->offset)
> +        return -1;
> +    else if (r1->offset > r2->offset)
> +        return 1;

Coding style.

> +    ASSERT_UNREACHABLE();
> +    return 0;
> +}
> +
> +static struct vpci_register *vpci_find_register(const struct pci_dev *pdev,
> +                                                const unsigned int reg,
> +                                                const unsigned int size)
> +{
> +    struct rb_node *node;

const

> +    struct vpci_register r = {
> +        .offset = reg,
> +        .size = size,
> +    };
> +
> +    ASSERT(vpci_locked(pdev->domain));
> +
> +    node = pdev->vpci->handlers.rb_node;
> +    while ( node )
> +    {
> +        struct vpci_register *t =

const

> +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t read_handler,
> +                          vpci_write_t write_handler, unsigned int offset,
> +                          unsigned int size, void *data)
> +{
> +    struct rb_node **new, *parent;
> +    struct vpci_register *r;
> +
> +    /* Some sanity checks. */
> +    if ( (size != 1 && size != 2 && size != 4) || offset >= 0xFFF ||

Off by one again in the offset check.

> +         offset & (size - 1) || read_handler == NULL || write_handler == 
> NULL )

As said, I'm not convinced either of the read or write handlers
being NULL is really a mistake. Both of them being NULL surely
is.

> +        return -EINVAL;
> +
> +    r = xzalloc(struct vpci_register);

Looks like xmalloc() would be fine here - you initialize all fields.

> +    if ( !r )
> +        return -ENOMEM;
> +
> +    r->read = read_handler;
> +    r->write = write_handler;
> +    r->size = size;
> +    r->offset = offset;
> +    r->priv_data = data;
> +
> +    vpci_lock(pdev->domain);
> +    new = &pdev->vpci->handlers.rb_node;
> +    parent = NULL;
> +
> +    while (*new) {

Coding style.

> +        struct vpci_register *this =

const

> +int xen_vpci_remove_register(struct pci_dev *pdev, unsigned int offset)
> +{
> +    struct vpci_register *r;
> +
> +    vpci_lock(pdev->domain);
> +    r = vpci_find_register(pdev, offset, 1 /* size doesn't matter here. */);

I'm not sure about this - is there anything wrong with the caller,
knowing the size, also passing it? You could then even refuse
requests to remove a register where (offset,size) doesn't match
the recorded values (as vpci_find_register() will return any
overlapping one).

> +    if ( !r )
> +    {
> +        vpci_unlock(pdev->domain);
> +        return -ENOENT;
> +    }
> +
> +    rb_erase(&r->node, &pdev->vpci->handlers);
> +    xfree(r);
> +    vpci_unlock(pdev->domain);

Please swap xfree() and unlock.

> +static void vpci_read_hw(unsigned int seg, unsigned int bus,
> +                         unsigned int devfn, unsigned int reg, uint32_t size,
> +                         uint32_t *data)

Instead of passing a pointer to the result, please consider returning
the value, as the function doesn't return anything at present.

> +{
> +    switch ( size )
> +    {
> +    case 4:
> +        *data = pci_conf_read32(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                                reg);
> +        break;
> +    case 3:
> +        /*
> +         * This is possible because a 4byte read can have 1byte trapped and
> +         * the rest passed-through.
> +         */
> +        *data = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                                reg + 1) << 8;
> +        *data |= pci_conf_read8(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                               reg);

Which of the two parts to read with read16() should depend on the
low bit of reg. Also for maximum compatibility I'd strongly suggest
reading the low part before the high one.

> +/* Helper macros for the read/write handlers. */
> +#define GENMASK_BYTES(e, s) GENMASK((e) * 8, (s) * 8)

What do e and s stand for here?

> +#define SHIFT_RIGHT_BYTES(d, o) d >>= (o) * 8

And at least o here?

> +#define ADD_RESULT(r, d, s, o) r |= ((d) & GENMASK_BYTES(s, 0)) << ((o) * 8)

And d, s, and o here?

Also I can't see what addition you would want to perform below.
All you ought to do are ANDs and ORs.

> +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int devfn,

The function being other than void, same question as earlier:
What's the bare hardware equivalent of this returning other
than zero?

> +                  unsigned int reg, uint32_t size, uint32_t *data)
> +{
> +    struct domain *d = current->domain;
> +    struct pci_dev *pdev;
> +    const struct vpci_register *r;
> +    union vpci_val val = { .double_word = 0 };
> +    unsigned int data_rshift = 0, data_lshift = 0, data_size;
> +    uint32_t tmp_data;
> +    int rc;
> +
> +    ASSERT(vpci_locked(d));
> +
> +    *data = 0;
> +
> +    /* Find the PCI dev matching the address. */
> +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);

What about the global PCI devices lock here? While VT-d code,
perhaps wrongly, doesn't acquire that lock prior to calling the
function, all callers in passthrough/pci.c do or verify it is being
held.

> +    if ( !pdev )
> +        goto passthrough;
> +
> +    /* Find the vPCI register handler. */
> +    r = vpci_find_register(pdev, reg, size);

With the overlap handling in vpci_find_register() I can't see how
this would reliably return the correct (lowest) register when the
request spans multiple ones.

> +    if ( !r )
> +        goto passthrough;
> +
> +    if ( r->offset > reg )
> +    {
> +        /*
> +         * There's a heading gap into the emulated register.
> +         * NB: it's possible for this recursive call to have a size of 3.
> +         */
> +        rc = xen_vpci_read(seg, bus, devfn, reg, r->offset - reg, &tmp_data);

I'm not particularly happy to see recursion being used here, even if
that's not going to be very deep. Both qemu and pciback get away
without, iirc, and while it's not the neatest code I find qemu's easier
to follow than the apparently written from scratch variant here. Is
there a particular reason you didn't at least take what is there as a
basis?

> +        if ( rc )
> +            return rc;
> +
> +        /* Add the head read to the partial result. */
> +        ADD_RESULT(*data, tmp_data, r->offset - reg, 0);
> +        data_lshift = r->offset - reg;
> +
> +        /* Account for the read. */
> +        size -= data_lshift;
> +        reg += data_lshift;
> +    }
> +    else if ( r->offset < reg )
> +        /* There's an offset into the emulated register */
> +        data_rshift = reg - r->offset;

This could be a plain else, avoiding another conditional branch.

> +    ASSERT(data_lshift == 0 || data_rshift == 0);
> +    data_size = min(size, r->size - data_rshift);
> +    ASSERT(data_size != 0);
> +
> +    /* Perform the read of the register. */
> +    rc = r->read(pdev, r->offset, &val, r->priv_data);
> +    if ( rc )
> +        return rc;
> +
> +    val.double_word >>= data_rshift * 8;
> +    ADD_RESULT(*data, val.double_word, data_size, data_lshift);
> +
> +    /* Account for the read */
> +    size -= data_size;
> +    reg += data_size;
> +
> +    /* Read the remaining, if any. */
> +    if ( size > 0 )
> +    {
> +        /*
> +         * Read tailing data.

trailing?

> +static int vpci_write_helper(struct pci_dev *pdev,
> +                             const struct vpci_register *r, unsigned int 
> size,
> +                             unsigned int offset, uint32_t data)
> +{
> +    union vpci_val val = { .double_word = data };
> +    int rc;
> +
> +    ASSERT(size <= r->size);
> +    if ( size != r->size )
> +    {
> +        rc = r->read(pdev, r->offset, &val, r->priv_data);
> +        if ( rc )
> +            return rc;
> +        val.double_word &= ~GENMASK_BYTES(size + offset, offset);
> +        data &= GENMASK_BYTES(size, 0);
> +        val.double_word |= data << (offset * 8);
> +    }
> +
> +    return r->write(pdev, r->offset, val, r->priv_data);
> +}

I'm not sure that writing back the value read is correct in all cases
(think of write-only or rw1c registers or even offsets where reads
and writes access different registers altogether). I think the write
handlers will need to be made capable of dealing with partial writes.

> +int xen_vpci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                   unsigned int reg, uint32_t size, uint32_t data)
> +{
> +    struct domain *d = current->domain;
> +    struct pci_dev *pdev;
> +    const struct vpci_register *r;
> +    unsigned int data_size, data_offset = 0;
> +    int rc;
> +
> +    ASSERT(vpci_locked(d));
> +
> +    /* Find the PCI dev matching the address. */
> +    pdev = pci_get_pdev_by_domain(d, seg, bus, devfn);
> +    if ( !pdev )
> +        goto passthrough;
> +
> +    /* Find the vPCI register handler. */
> +    r = vpci_find_register(pdev, reg, size);
> +    if ( !r )
> +        goto passthrough;
> +
> +    else if ( r->offset > reg )

Pointless "else" again, even more so with the blank line in between.

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -13,6 +13,7 @@
>  #include <xen/irq.h>
>  #include <xen/pci_regs.h>
>  #include <xen/pfn.h>
> +#include <xen/rbtree.h>

Why? All you add to this file is ...

> @@ -88,6 +89,9 @@ struct pci_dev {
>  #define PT_FAULT_THRESHOLD 10
>      } fault;
>      u64 vf_rlen[6];
> +
> +    /* Data for vPCI. */
> +    struct vpci *vpci;

... this. I guess you really want to add the #include ...

> --- /dev/null
> +++ b/xen/include/xen/vpci.h
> @@ -0,0 +1,66 @@
> +#ifndef _VPCI_
> +#define _VPCI_
> +
> +#include <xen/pci.h>
> +#include <xen/types.h>

... here.

> +/* Helpers for locking/unlocking. */
> +#define vpci_lock(d) spin_lock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_unlock(d) spin_unlock(&(d)->arch.hvm_domain.vpci_lock)
> +#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)

While for the code layering you don't need recursive locks, did you
consider using them nevertheless so that spin_is_locked() return
values are actually meaningful for your purposes?

> +#define REGISTER_VPCI_INIT(x) \
> +  static const vpci_register_init_t x##_entry __used_section(".data.vpci") = 
> x

To match up with the type name and assuming "REGISTER" here
means the PCI register rather than "registration", I think this
would better be VPCI_REGISTER() (I don't really mind the _INIT
suffix, but I think it's relatively pointless).

> +/* Add vPCI handlers to device. */
> +int xen_vpci_add_handlers(struct pci_dev *dev);
> +
> +/* Add/remove a register handler. */
> +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t read_handler,
> +                          vpci_write_t write_handler, unsigned int offset,
> +                          unsigned int size, void *data);
> +int xen_vpci_remove_register(struct pci_dev *pdev, unsigned int offset);
> +
> +/* Generic read/write handlers for the PCI config space. */
> +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                  unsigned int reg, uint32_t size, uint32_t *data);
> +int xen_vpci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
> +                   unsigned int reg, uint32_t size, uint32_t data);

Along the lines of what I've said in a few places about return values,
please carefully consider where they're needed. Once you decide
they are really needed, the respective functions would likely want to
become __must_check.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.