[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/9] xen/vpci: introduce basic handlers to trap accesses to the PCI config space



On Mon, May 29, 2017 at 08:16:41AM -0600, Jan Beulich wrote:
> >>> On 29.05.17 at 14:57, <roger.pau@xxxxxxxxxx> wrote:
> > On Fri, May 19, 2017 at 05:35:47AM -0600, Jan Beulich wrote:
> >> >>> On 27.04.17 at 16:35, <roger.pau@xxxxxxxxxx> wrote:
> >> > +    /* Read/write of unset register. */
> >> > +    VPCI_READ_CHECK(8, 4, 0xffffffff);
> >> > +    VPCI_READ_CHECK(8, 2, 0xffff);
> >> > +    VPCI_READ_CHECK(8, 1, 0xff);
> >> > +    VPCI_WRITE(10, 2, 0xbeef);
> >> > +    VPCI_READ_CHECK(10, 2, 0xffff);
> >> > +
> >> > +    /* Read of multiple registers */
> >> > +    VPCI_CHECK_REG(7, 1, 0xbd);
> >> > +    VPCI_READ_CHECK(4, 4, 0xbdbabaff);
> >> 
> >> I think a variant accessing mixed size registers would also be
> >> desirable here. Perhaps it would be best to exhaustively test
> >> all possible variations (there aren't that many after all). Same
> >> for writes and partial accesses (below) then.
> > 
> > So you mean to scan the whole space (from 0 to 128 on this test) using
> > all possible register sizes for both read and write? That would indeed
> > be feasible.
> 
> Not sure what the "from 0 to 128" is meant to apply to. What I mean
> is to test all combinations of (mix of) register sizes and access sizes.
> I.e. all combinations making up a single 4-byte field ((1,1,1,1),
> (1,1,2), (2,1,1), (2,2), (4)) and all four 1-byte accesses, both 2-byte
> ones, and the sole possible 4-byte one.

OK, thanks for the clarification, now I get it.

> >> > @@ -256,6 +257,152 @@ void register_g2m_portio_handler(struct domain *d)
> >> >      handler->ops = &g2m_portio_ops;
> >> >  }
> >> >  
> >> > +/* Do some sanity checks. */
> >> > +static int vpci_access_check(unsigned int reg, unsigned int len)
> >> > +{
> >> > +    /* Check access size. */
> >> > +    if ( len != 1 && len != 2 && len != 4 )
> >> > +    {
> >> > +        gdprintk(XENLOG_WARNING, "invalid length (reg: %#x, len: %u)\n",
> >> > +                 reg, len);
> >> 
> >> I think many of such gdprintk()s want to go away before this series
> >> gets committed.
> > 
> > OK, I've found them useful while developing, but I guess they are not
> > really useful outside from that context. I guess there's no way to
> > leave them in place, maybe a Kconfig option?
> 
> That seems overkill. I wouldn't so much mind the messages (they
> get compiled out for non-debug builds anyway), but the clutter
> they introduce to code: In some cases half of your functions are
> the invocation of gdprintk() ...

Let me try to prune some of them.

> >> > +/* vPCI config space IO ports handlers (0xcf8/0xcfc). */
> >> > +static bool_t vpci_portio_accept(const struct hvm_io_handler *handler,
> >> 
> >> Plain bool please.
> > 
> > Sadly struct hvm_io_ops (which is where this function is used) expects
> > a bool_t as return.
> 
> I don't follow - bool_t is simply a typedef of bool nowadays, and
> typedefs are all equivalent as far as the C type system goes.

Clearly my bad, I assumed they where actually different types.

> >> Again the question - what's the bare hardware equivalent of
> >> returning X86EMUL_UNHANDLEABLE here?
> > 
> > All 1's I assume (or other random garbage). Would you be OK with me
> > adding a "fail" label that sets data to all 1's and returns X86EMUL_OKAY?
> 
> That would probably be okay (despite my dislike of labels and goto-s).

Maybe the goto is not needed after all if vpci_read returns the data
filled with 1's and no error code.

> >> > +#include <xen/sched.h>
> >> > +#include <xen/vpci.h>
> >> > +
> >> > +extern const vpci_register_init_t __start_vpci_array[], 
> >> > __end_vpci_array[];
> >> > +#define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
> >> > +#define vpci_init __start_vpci_array
> >> 
> >> What is this last one good for?
> > 
> > It's used by xen_vpci_add_handlers in order to call the init
> > functions, I can drop it and call __start_vpci_array[i](...) if that's
> > better.
> 
> Well, if there were several users, I could see the benefit of an
> abbreviating #define. But for a single user the #define adds
> more code / clutter than is being saved on the use site.

Ack.

> >> > +    struct rb_node node;
> >> > +};
> >> > +
> >> > +int xen_vpci_add_handlers(struct pci_dev *pdev)
> >> 
> >> __hwdom_init (I notice setup_one_hwdom_device() wrongly isn't
> >> annotated so).
> > 
> > OK, so you really want the init handlers to be inside of the
> > .init.rodata section then.
> 
> Only if that's correct, and it is correct as long as all possible call trees
> root in a __hwdom_init function. (To avoid misunderstanding: This
> clearly can't be .init.rodata uniformly, as __hwdom_init isn't always
> an alias of __init).

OK.

> >> > +int xen_vpci_add_register(struct pci_dev *pdev, vpci_read_t 
> >> > read_handler,
> 
> Btw., I only now notice this further strange xen_ prefix here.

I assume this should be vpci_*, (dropping the xen_ prefix uniformly).

> >> > +                          vpci_write_t write_handler, unsigned int 
> >> > offset,
> >> > +                          unsigned int size, void *data)
> >> > +{
> >> > +    struct rb_node **new, *parent;
> >> > +    struct vpci_register *r;
> >> > +
> >> > +    /* Some sanity checks. */
> >> > +    if ( (size != 1 && size != 2 && size != 4) || offset >= 0xFFF ||
> >> 
> >> Off by one again in the offset check.
> > 
> > Fixed to be > 0xfff. Should this maybe be added to pci_regs.h as
> > PCI_MAX_REGISTER? 
> 
> Could be done, but then we need one for base and one for
> extended config space. May want to check whether Linux has
> invented some good names for these by now.

pci_regs.h from Linux now has:

/*
 * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of
 * configuration space.  PCI-X Mode 2 and PCIe devices have 4096 bytes of
 * configuration space.
 */
#define PCI_CFG_SPACE_SIZE      256
#define PCI_CFG_SPACE_EXP_SIZE  4096

At the top. Do you think those defines are fine for importing? (this
was introduced by cc10385b6fde3, but I don't think importing this in a
more formal way makes sense).

> >> > +/* Helper macros for the read/write handlers. */
> >> > +#define GENMASK_BYTES(e, s) GENMASK((e) * 8, (s) * 8)
> >> 
> >> What do e and s stand for here?
> > 
> > e = end, s = start (in bytes).
> 
> And where do you start counting. Having seen the rest of the
> series I'm actually rather unconvinced use these macros results
> in better code - to me, plain hex numbers are quite a bit easier
> to read as long as the number of digits doesn't go meaningfully
> beyond 10 or so.
> 
> >> > +#define SHIFT_RIGHT_BYTES(d, o) d >>= (o) * 8
> >> 
> >> And at least o here?
> > 
> > o = offset (in bytes)
> 
> I think simply writhing the shift expression is once again more
> clear to the reader than using a macro which is longer to read
> and type and which has semantics which aren't immediately
> clear from its name.
> 
> > I can rename ADD_RESULT to APPEND_RESULT or something more descriptive
> > if you think it's going to make it easier to understand.
> 
> I'd prefer if the name "merge" appeared in that name - I don't see
> this being usable strictly only to append to either side of a value.

OK MERGE_RESULT or MERGE_REGISTER maybe? (and the rest removed).

> >> > +int xen_vpci_read(unsigned int seg, unsigned int bus, unsigned int 
> >> > devfn,
> >> 
> >> The function being other than void, same question as earlier:
> >> What's the bare hardware equivalent of this returning other
> >> than zero?
> > 
> > I though it would be useful to have some more fine-grained error
> > reporting if that's ever needed, although as you say, from a hardware
> > point of view any errors will be simply reported as the value obtained
> > being all 1's.
> > 
> > The question is, should this already return all 1's, or should the
> > called translate failures into all 1's?
> 
> If you leave this to the callers, overall code would likely become
> less readable, so I'd prefer this to be done in a central place.

I will change the prototypes/code of vpci_{read/write} so no error
code is returned (and in the read case the value on error is going to
be 1's).

> >> > +    /* Find the vPCI register handler. */
> >> > +    r = vpci_find_register(pdev, reg, size);
> >> 
> >> With the overlap handling in vpci_find_register() I can't see how
> >> this would reliably return the correct (lowest) register when the
> >> request spans multiple ones.
> > 
> > It doesn't need to, if there's a lower register it will be found by
> > the recursive call to xen_vpci_read done below (before calling into
> > the handler pointed by r).
> 
> Since that's quite non-obvious, to me this is another argument
> against using recursion here.

Yes, will switch to a sorted list and no recursion.

> >> > +static int vpci_write_helper(struct pci_dev *pdev,
> >> > +                             const struct vpci_register *r, unsigned 
> >> > int size,
> >> > +                             unsigned int offset, uint32_t data)
> >> > +{
> >> > +    union vpci_val val = { .double_word = data };
> >> > +    int rc;
> >> > +
> >> > +    ASSERT(size <= r->size);
> >> > +    if ( size != r->size )
> >> > +    {
> >> > +        rc = r->read(pdev, r->offset, &val, r->priv_data);
> >> > +        if ( rc )
> >> > +            return rc;
> >> > +        val.double_word &= ~GENMASK_BYTES(size + offset, offset);
> >> > +        data &= GENMASK_BYTES(size, 0);
> >> > +        val.double_word |= data << (offset * 8);
> >> > +    }
> >> > +
> >> > +    return r->write(pdev, r->offset, val, r->priv_data);
> >> > +}
> >> 
> >> I'm not sure that writing back the value read is correct in all cases
> >> (think of write-only or rw1c registers or even offsets where reads
> >> and writes access different registers altogether). I think the write
> >> handlers will need to be made capable of dealing with partial writes.
> > 
> > That seems to be what pciback does fro writes: read, merge value,
> > write back (drivers/xen/xen-pciback/conf_space.c
> > xen_pcibk_config_write):
> > 
> > err = conf_space_read(dev, cfg_entry, field_start,
> >                   &tmp_val);
> > if (err)
> >     break;
> > 
> > tmp_val = merge_value(tmp_val, value, get_mask(size),
> >                   offset - field_start);
> > 
> > err = conf_space_write(dev, cfg_entry, field_start,
> >                    tmp_val);
> > 
> > I would prefer to do it this way in order to avoid adding more
> > complexity to the handlers themselves. So far I haven't found any such
> > registers (rw1c) in the PCI config space, do you have references to
> > any of them?
> 
> The status register.

But the status register is not going to be trapped, not by Dom0 or
DomUs? None of the registers that I've emulated for the header, or the
capabilities behave this way, and adding such and offset would
greatly increase the complexity of each handler IMHO.

Maybe it would be easier to add a flag to mark rw1c registers as such,
if that's ever needed? (and avoid the read in that case)

> >> > +/* Helpers for locking/unlocking. */
> >> > +#define vpci_lock(d) spin_lock(&(d)->arch.hvm_domain.vpci_lock)
> >> > +#define vpci_unlock(d) spin_unlock(&(d)->arch.hvm_domain.vpci_lock)
> >> > +#define vpci_locked(d) spin_is_locked(&(d)->arch.hvm_domain.vpci_lock)
> >> 
> >> While for the code layering you don't need recursive locks, did you
> >> consider using them nevertheless so that spin_is_locked() return
> >> values are actually meaningful for your purposes?
> > 
> > I'm not sure I follow, spin_is_locked already return meaningful values
> > for my purpose AFAICT.
> 
> For non-recursive locks this tells you whether _any_ CPU holds
> the lock, yet normally you want to know whether the CPU you
> run on does.

Indeed, so if I make the lock recursive spin_is_locked is going to
return whether the current CPU holds the lock. That's kind of
counter-intuitive.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.