[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 5/9] xen/vpci: add handlers to map the BARs



>>> On 27.04.17 at 16:35, <roger.pau@xxxxxxxxxx> wrote:
> +static int vpci_modify_bars(struct pci_dev *pdev, const bool map)
> +{
> +    struct vpci_header *header = &pdev->vpci->header;
> +    unsigned int i;
> +    int rc = 0;
> +
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        paddr_t gaddr = map ? header->bars[i].gaddr
> +                            : header->bars[i].mapped_addr;
> +        paddr_t paddr = header->bars[i].paddr;
> +
> +        if ( header->bars[i].type != VPCI_BAR_MEM &&
> +             header->bars[i].type != VPCI_BAR_MEM64_LO )
> +            continue;
> +
> +        rc = modify_mmio(pdev->domain, _gfn(PFN_DOWN(gaddr)),
> +                         _mfn(PFN_DOWN(paddr)), PFN_UP(header->bars[i].size),

The PFN_UP() indicates a problem: For sub-page BARs you can't
blindly map/unmap them without taking into consideration other
devices sharing the same page.

> +                         map);
> +        if ( rc )
> +            break;
> +
> +        header->bars[i].mapped_addr = map ? gaddr : 0;
> +    }
> +
> +    return rc;
> +}

Shouldn't this function somewhere honor the unset flags?

> +static int vpci_cmd_read(struct pci_dev *pdev, unsigned int reg,
> +                         union vpci_val *val, void *data)
> +{
> +    struct vpci_header *header = data;
> +
> +    val->word = header->command;

Rather than reading back and storing the value in the write handler,
I'd recommending doing an actual read here.

> +static int vpci_cmd_write(struct pci_dev *pdev, unsigned int reg,
> +                          union vpci_val val, void *data)
> +{
> +    struct vpci_header *header = data;
> +    uint16_t new_cmd, saved_cmd;
> +    uint8_t seg = pdev->seg, bus = pdev->bus;
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    int rc;
> +
> +    new_cmd = val.word;
> +    saved_cmd = header->command;
> +
> +    if ( !((new_cmd ^ saved_cmd) & PCI_COMMAND_MEMORY) )
> +        goto out;
> +
> +    /* Memory space access change. */
> +    rc = vpci_modify_bars(pdev, new_cmd & PCI_COMMAND_MEMORY);
> +    if ( rc )
> +    {
> +        dprintk(XENLOG_ERR,
> +                "%04x:%02x:%02x.%u:unable to %smap BARs: %d\n",
> +                seg, bus, slot, func,
> +                new_cmd & PCI_COMMAND_MEMORY ? "" : "un", rc);
> +        return rc;

I guess you can guess the question already: What is the bare
hardware equivalent of this failure return?

> +    }
> +
> + out:

Please try to avoid goto-s and labels for other than error handling
(and even then only when code would otherwise end up pretty
convoluted).

> +static int vpci_bar_read(struct pci_dev *pdev, unsigned int reg,
> +                         union vpci_val *val, void *data)
> +{
> +    struct vpci_bar *bar = data;

const

> +    bool hi = false;
> +
> +    ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO ||
> +           bar->type == VPCI_BAR_MEM64_HI);
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg - PCI_BASE_ADDRESS_0 > 0);

reg > PCI_BASE_ADDRESS_0

> +        bar--;
> +        hi = true;
> +    }
> +
> +    if ( bar->sizing )
> +        val->double_word = ~(bar->size - 1) >> (hi ? 32 : 0);

There's also a comment further down - this is producing undefined
behavior on 32-bits arches.

> +static int vpci_bar_write(struct pci_dev *pdev, unsigned int reg,
> +                          union vpci_val val, void *data)
> +{
> +    struct vpci_bar *bar = data;
> +    uint32_t wdata = val.double_word;
> +    bool hi = false, unset = false;
> +
> +    ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO ||
> +           bar->type == VPCI_BAR_MEM64_HI);
> +
> +    if ( wdata == GENMASK(31, 0) )

I'm afraid this again doesn't match real hardware behavior: As the
low bits are r/o, writes with them having any value, but all other
bits being 1 should have the same effect. I notice that while I had
fixed this for the ROM BAR in Linux'es pciback, I should have also
fixed this for ordinary ones.

> +    {
> +        /* Next reads from this register are going to return the BAR size. */
> +        bar->sizing = true;
> +        return 0;
> +    }
> +
> +    /* End previous sizing cycle if any. */
> +    bar->sizing = false;
> +
> +    unset = bar->unset;
> +    if ( unset )
> +        bar->unset = false;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg - PCI_BASE_ADDRESS_0 > 0);
> +        bar--;
> +        hi = true;
> +    }
> +
> +    /* Update the relevant part of the BAR address. */
> +    bar->gaddr &= hi ? ~GENMASK(63, 32) : ~GENMASK(31, 0);
> +    wdata &= hi ? GENMASK(31, 0) : PCI_BASE_ADDRESS_MEM_MASK;

Perhaps easier to grok as

    if ( hi )
        wdata &= PCI_BASE_ADDRESS_MEM_MASK;

However, considering the dual use below, I'd prefer if you wrote
back the value you read to the low 4 bits. They're _supposed_ to
be r/o, yes, but anyway.

> +    bar->gaddr |= (uint64_t)wdata << (hi ? 32 : 0);
> +
> +    if ( unset )
> +    {
> +        bar->paddr = bar->gaddr;

So this deals with first time setting of the BAR by Dom0. If Dom0
later decides to move BARs around, how do you guarantee things
to continue to work fine if you allow paddr and gaddr to go out of
sync? Often the reason to do re-assignments is because the OS
recognized address conflicts. Or it needs to make room for SR-IOV
BARs.

> +        pci_conf_write16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                         PCI_FUNC(pdev->devfn), reg, wdata);

pci_conf_write32()

> +    }
> +
> +    ASSERT(IS_ALIGNED(bar->gaddr, PAGE_SIZE));

Urgh.

> +static int vpci_init_bars(struct pci_dev *pdev)
> +{
> +    uint8_t seg = pdev->seg, bus = pdev->bus;
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    uint8_t header_type;
> +    unsigned int i, num_bars;
> +    struct vpci_header *header = &pdev->vpci->header;
> +    struct vpci_bar *bars = header->bars;
> +    int rc;
> +
> +    header_type = pci_conf_read8(seg, bus, slot, func, PCI_HEADER_TYPE) & 
> 0x7f;
> +    if ( header_type == PCI_HEADER_TYPE_NORMAL )
> +        num_bars = 6;
> +    else if ( header_type == PCI_HEADER_TYPE_BRIDGE )
> +        num_bars = 2;
> +    else
> +        return -ENOSYS;

-EOPNOTSUPP

> +    /* Setup a handler for the control register. */
> +    header->command = pci_conf_read16(seg, bus, slot, func, PCI_COMMAND);

As the code says, the register is the Command Register, so your
comment shouldn't say "control".

> +    rc = xen_vpci_add_register(pdev, vpci_cmd_read, vpci_cmd_write,
> +                               PCI_COMMAND, 2, header);
> +    if ( rc )
> +    {
> +        dprintk(XENLOG_ERR,
> +                "%04x:%02x:%02x.%u: failed to add handler register %#x: 
> %d\n",
> +                seg, bus, slot, func, PCI_COMMAND, rc);
> +        return rc;
> +    }
> +
> +    for ( i = 0; i < num_bars; i++ )
> +    {
> +        uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
> +        uint32_t val = pci_conf_read32(seg, bus, slot, func, reg);
> +        uint64_t addr, size;
> +        unsigned int index;
> +
> +        if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
> +        {
> +            bars[i].type = VPCI_BAR_MEM64_HI;
> +            bars[i].unset = bars[i - 1].unset;
> +            continue;

Neither here nor below you install a handler for this upper half.

> +        }
> +        else if ( (val & PCI_BASE_ADDRESS_SPACE) == 
> PCI_BASE_ADDRESS_SPACE_IO )
> +        {
> +            bars[i].type = VPCI_BAR_IO;
> +            continue;
> +        }
> +        else if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==

Pointless "else" (twice).

> +                  PCI_BASE_ADDRESS_MEM_TYPE_64 )
> +            bars[i].type = VPCI_BAR_MEM64_LO;
> +        else
> +            bars[i].type = VPCI_BAR_MEM;
> +
> +        /* Size the BAR and map it. */
> +        index = i;
> +        rc = pci_size_bar(seg, bus, slot, func, PCI_BASE_ADDRESS_0, num_bars,
> +                          &index, &addr, &size);
> +        if ( rc )
> +        {
> +            dprintk(XENLOG_ERR,
> +                    "%04x:%02x:%02x.%u: unable to size BAR#%u: %d\n",
> +                    seg, bus, slot, func, i, rc);
> +            return rc;
> +        }
> +
> +        if ( size == 0 )
> +        {
> +            bars[i].type = VPCI_BAR_EMPTY;
> +            continue;
> +        }
> +
> +        if ( (bars[i].type == VPCI_BAR_MEM && addr == GENMASK(31, 12)) ||
> +             addr == GENMASK(63, 26) )

Where is this 26 coming from?

Perhaps

    if ( addr == GENMASK(bars[i].type == VPCI_BAR_MEM ? 31 : 63, 12) )

? Albeit I'm unconvinced GENMASK() is useful to be used here anyway
(see also below).

> +        {
> +            /* BAR is not positioned. */

I can't find anything in the standard saying that all-ones upper
address bits indicate an unassigned BAR. As long as the memory
decode bit is off, all BARs are to be considered unassigned afaik.
Furthermore you can't possibly read e.g. 0xfffff000 from a
32-bit BAR covering more than 4k.

> +            bars[i].unset = true;
> +            ASSERT(is_hardware_domain(pdev->domain));
> +            ASSERT(!(header->command & PCI_COMMAND_MEMORY));

You're asserting guest controlled state here (even if it's Dom0).

> +        }
> +
> +        ASSERT(IS_ALIGNED(addr, PAGE_SIZE));

Urgh (again).

> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -50,6 +50,34 @@ int xen_vpci_write(unsigned int seg, unsigned int bus, 
> unsigned int devfn,
>  struct vpci {
>      /* Root pointer for the tree of vPCI handlers. */
>      struct rb_root handlers;
> +
> +#ifdef __XEN__
> +    /* Hide the rest of the vpci struct from the user-space test harness. */
> +    struct vpci_header {
> +        /* Cached value of the command register. */
> +        uint16_t command;
> +        /* Information about the PCI BARs of this device. */
> +        struct vpci_bar {
> +            enum {
> +                VPCI_BAR_EMPTY,
> +                VPCI_BAR_IO,
> +                VPCI_BAR_MEM,

MEM32?

> +                VPCI_BAR_MEM64_LO,
> +                VPCI_BAR_MEM64_HI,
> +            } type;
> +            /* Hardware address. */
> +            paddr_t paddr;
> +            /* Guest address where the BAR should be mapped. */
> +            paddr_t gaddr;
> +            /* Current guest address where the BAR is mapped. */
> +            paddr_t mapped_addr;

Why do you need to track both "should be" and "is" addresses? Also
I think all three would more naturally be frame numbers.

> +            size_t size;

Is this enough for e.g. ARM32 (remember this is a common
header)?

> +            unsigned int attributes:4;

???

> +            bool sizing;
> +            bool unset;

Isn't this redundant with e.g. gaddr (or as per above gfn) being
INVALID_PADDR (INVALID_GFN)?

> +        } bars[6];

What about the ROM and SR-IOV ones?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.