|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 07/15] argo: implement the register op
On Mon, Jan 14, 2019 at 6:19 AM Jan Beulich <JBeulich@xxxxxxxx> wrote:
>
> >>> On 07.01.19 at 08:42, <christopher.w.clark@xxxxxxxxx> wrote:
> > --- a/xen/common/argo.c
> > +++ b/xen/common/argo.c
> > @@ -23,16 +23,41 @@
> > #include <xen/event.h>
> > #include <xen/domain_page.h>
> > #include <xen/guest_access.h>
> > +#include <xen/lib.h>
> > +#include <xen/nospec.h>
> > #include <xen/time.h>
> > #include <public/argo.h>
> >
> > +#define MAX_RINGS_PER_DOMAIN 128U
> > +
> > +/* All messages on the ring are padded to a multiple of the slot size. */
> > +#define ROUNDUP_MESSAGE(a) (ROUNDUP((a), XEN_ARGO_MSG_SLOT_SIZE))
>
> Pointless outermost pair of parentheses.
ack, removed
>
> > @@ -198,6 +223,31 @@ static DEFINE_RWLOCK(argo_lock); /* L1 */
> > #define argo_dprintk(format, ... ) ((void)0)
> > #endif
> >
> > +/*
> > + * This hash function is used to distribute rings within the per-domain
> > + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> > + * will provide a struct if a match is found with a 'argo_ring_id' key:
> > + * ie. the key is a (domain id, port, partner domain id) tuple.
> > + * Since port number varies the most in expected use, and the Linux driver
> > + * allocates at both the high and low ends, incorporate high and low bits
> > to
> > + * help with distribution.
> > + * Apply array_index_nospec as a defensive measure since this operates
> > + * on user-supplied input and the array size that it indexes into is known.
> > + */
> > +static unsigned int
> > +hash_index(const struct argo_ring_id *id)
> > +{
> > + unsigned int hash;
> > +
> > + hash = (uint16_t)(id->port >> 16);
> > + hash ^= (uint16_t)id->port;
>
> I may have asked this before, but are the casts really needed
> with ...
>
> > + hash ^= id->domain_id;
> > + hash ^= id->partner_id;
> > + hash &= (ARGO_HTABLE_SIZE - 1);
>
> ... the masking done here?
>
> > + return array_index_nospec(hash, ARGO_HTABLE_SIZE);
>
> With the masking above - is this really needed?
>
> And then the question is whether the quality of the hash is
> sufficient: There won't be more set bits in the result than
> are in any of the three input values, so if they're all small,
> higher hash table entries won't be used at all. I would
> assume the goal to be that by the time 32 entities appear,
> chances be good that at least about 30 of the hash table
> entries are in use.
ok, I'll replace this function and address the above.
I'm out of time today so have added a FIXME for today's series posting
and will get it done tomorrow.
>
> > @@ -219,6 +269,78 @@ ring_unmap(struct argo_ring_info *ring_info)
> > }
> > }
> >
> > +static int
> > +ring_map_page(struct argo_ring_info *ring_info, unsigned int i, void
> > **out_ptr)
> > +{
> > + if ( i >= ring_info->nmfns )
> > + {
> > + gprintk(XENLOG_ERR,
> > + "argo: ring (vm%u:%x vm%d) %p attempted to map page %u of
> > %u\n",
>
> ring_info->id.{domain,partner}_id look to be of the same type -
> why once %u and once %d? Same elsewhere.
Fixed across the series to use %u for domid_t output.
>
> > + ring_info->id.domain_id, ring_info->id.port,
> > + ring_info->id.partner_id, ring_info, i, ring_info->nmfns);
> > + return -ENOMEM;
> > + }
>
> i = array_index_nospec(i, ring_info->nmfns);
>
> considering the array indexes here? Of course at this point only
> zero can be passed in, but I assume this changes in later patches
> and the index is at least indirectly guest controlled.
Added, thanks.
>
> > @@ -371,6 +493,418 @@ partner_rings_remove(struct domain *src_d)
> > }
> > }
> >
> > +static int
> > +find_ring_mfn(struct domain *d, gfn_t gfn, mfn_t *mfn)
>
> So you have find_ring_mfn(), find_ring_mfns(), and ring_find_info().
> Any chance you could use a consistent ordering of "ring" and "find"?
> Or is there a reason behind the apparent mismatch?
I've renamed them to use 'find_' as the common prefix. Look cleaner to
me. thanks.
>
> > +{
> > + p2m_type_t p2mt;
> > + int ret = 0;
> > +
> > +#ifdef CONFIG_X86
> > + *mfn = get_gfn_unshare(d, gfn_x(gfn), &p2mt);
> > +#else
> > + *mfn = p2m_lookup(d, gfn, &p2mt);
> > +#endif
> > +
> > + if ( !mfn_valid(*mfn) )
> > + ret = -EINVAL;
> > +#ifdef CONFIG_X86
> > + else if ( p2m_is_paging(p2mt) || (p2mt == p2m_ram_logdirty) )
> > + ret = -EAGAIN;
> > +#endif
> > + else if ( (p2mt != p2m_ram_rw) ||
> > + !get_page_and_type(mfn_to_page(*mfn), d, PGT_writable_page) )
> > + ret = -EINVAL;
> > +
> > +#ifdef CONFIG_X86
> > + put_gfn(d, gfn_x(gfn));
> > +#endif
> > +
> > + return ret;
> > +}
>
> Please check whether you could leverage check_get_page_from_gfn()
> here. If you can't, please at least take inspiration as to e.g. the
> #ifdef-s from that function.
Have added a temporary FIXME for this and will do this tomorrow.
>
> > +static int
> > +find_ring_mfns(struct domain *d, struct argo_ring_info *ring_info,
> > + uint32_t npage,
> > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> > + uint32_t len)
>
> Noticing it here, but perhaps still an issue elsewhere as well: Didn't
> we agree on removing unnecessary use of fixed width types? Or
> was that in the context on an earlier patch of v3?
These are fixed and hopefully all the others that do not belong are
also gone in v4.
>
> > +{
> > + unsigned int i;
> > + int ret = 0;
> > + mfn_t *mfns;
> > + uint8_t **mfn_mapping;
> > +
> > + /*
> > + * first bounds check on npage here also serves as an overflow check
> > + * before left shifting it
> > + */
> > + if ( (unlikely(npage > (XEN_ARGO_MAX_RING_SIZE >> PAGE_SHIFT))) ||
>
> Isn't this redundant with the check in do_argo_p()?
>
> > + ((npage << PAGE_SHIFT) < len) )
> > + return -EINVAL;
Answering your point inline above: Yes - do_argo_op does the bounds
checking, so I've removed the entire check above.
> > +
> > + if ( ring_info->mfns )
> > + {
> > + /* Ring already existed: drop the previous mapping. */
> > + gprintk(XENLOG_INFO,
> > + "argo: vm%u re-register existing ring (vm%u:%x vm%d) clears
> > mapping\n",
>
> Indentation (also elsewhere).
Ack, fixed here and elsewhere.
>
> > + d->domain_id, ring_info->id.domain_id,
> > + ring_info->id.port, ring_info->id.partner_id);
> > +
> > + ring_remove_mfns(d, ring_info);
> > + ASSERT(!ring_info->mfns);
> > + }
> > +
> > + mfns = xmalloc_array(mfn_t, npage);
> > + if ( !mfns )
> > + return -ENOMEM;
> > +
> > + for ( i = 0; i < npage; i++ )
> > + mfns[i] = INVALID_MFN;
> > +
> > + mfn_mapping = xzalloc_array(uint8_t *, npage);
> > + if ( !mfn_mapping )
> > + {
> > + xfree(mfns);
> > + return -ENOMEM;
> > + }
> > +
> > + ring_info->npage = npage;
> > + ring_info->mfns = mfns;
> > + ring_info->mfn_mapping = mfn_mapping;
>
> As the inverse to the cleanup sequence in an earlier patch: Please
> set ->npage last here even if it doesn't strictly matter.
npage is now gone after implementing Roger's feedback to only keep "len".
>
> > + ASSERT(ring_info->npage == npage);
>
> What is this trying to make sure, seeing the assignment just a
> few lines up?
removed
>
> > + if ( ring_info->nmfns == ring_info->npage )
> > + return 0;
>
> Can this happen with the ring_remove_mfns() call above?
No, not any more, you're right.
>
> > + for ( i = ring_info->nmfns; i < ring_info->npage; i++ )
>
> And hence can i start from other than zero here? And why not
> use the (possibly cheaper to access) function argument "npage"
> as the loop upper bound? The other similar loop a few lines up
> is coded that simpler way.
Yes, thanks, done.
>
> > +static long
> > +register_ring(struct domain *currd,
> > + XEN_GUEST_HANDLE_PARAM(xen_argo_register_ring_t) reg_hnd,
> > + XEN_GUEST_HANDLE_PARAM(xen_argo_page_descr_t) pg_descr_hnd,
> > + uint32_t npage, bool fail_exist)
> > +{
> > + xen_argo_register_ring_t reg;
> > + struct argo_ring_id ring_id;
> > + void *map_ringp;
> > + xen_argo_ring_t *ringp;
> > + struct argo_ring_info *ring_info;
> > + struct argo_send_info *send_info = NULL;
> > + struct domain *dst_d = NULL;
> > + int ret = 0;
> > + uint32_t private_tx_ptr;
> > +
> > + if ( copy_from_guest(®, reg_hnd, 1) )
> > + {
> > + ret = -EFAULT;
> > + goto out;
> > + }
> > +
> > + /*
> > + * A ring must be large enough to transmit messages, so requires space
> > for:
> > + * * 1 message header, plus
> > + * * 1 payload slot (payload is always rounded to a multiple of 16
> > bytes)
> > + * for the message payload to be written into, plus
> > + * * 1 more slot, so that the ring cannot be filled to capacity with a
> > + * single message -- see the logic in ringbuf_insert -- allowing for
> > this
> > + * ensures that there can be space remaining when a message is
> > present.
> > + * The above determines the minimum acceptable ring size.
> > + */
> > + if ( (reg.len < (sizeof(struct xen_argo_ring_message_header)
> > + + ROUNDUP_MESSAGE(1) + ROUNDUP_MESSAGE(1))) ||
>
> These two summands don't look to fulfill the "cannot be filled to
> capacity" constraint the comment describes, as (aiui) messages
> can be larger than 16 bytes. What's the deal?
This is intended to be about bound checking reg.len against a minimum
size: the smallest ring that you can fit a message onto, as determined
by the logic in ringbuf_insert. The smallest message you can send is:
sizeof(struct xen_argo_ring_message_header) + ROUNDUP_MESSAGE(1)
and then ringbuf_insert won't accept a message unless there is (at
least) ROUNDUP_MESSAGE(1) space remaining, so that, plus the smallest
message size, is the smallest viable ring. There's no point accepting
registration of a ring smaller than that.
You're right that messages can be larger than 16 bytes, but they can
only be sent to rings that are larger than that minimum - on a minimum
sized ring, they'll be rejected by sendv.
>
> > + (reg.len > XEN_ARGO_MAX_RING_SIZE) ||
> > + (reg.len != ROUNDUP_MESSAGE(reg.len)) ||
> > + (reg.pad != 0) )
> > + {
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + ring_id.partner_id = reg.partner_id;
> > + ring_id.port = reg.port;
> > + ring_id.domain_id = currd->domain_id;
> > +
> > + read_lock(&argo_lock);
>
> From here to ...
>
> > + if ( !currd->argo )
> > + {
> > + ret = -ENODEV;
> > + goto out_unlock;
> > + }
> > +
> > + if ( reg.partner_id == XEN_ARGO_DOMID_ANY )
> > + {
> > + if ( opt_argo_mac_enforcing )
> > + {
> > + ret = -EPERM;
> > + goto out_unlock;
> > + }
> > + }
> > + else
> > + {
> > + dst_d = get_domain_by_id(reg.partner_id);
> > + if ( !dst_d )
> > + {
> > + argo_dprintk("!dst_d, ESRCH\n");
> > + ret = -ESRCH;
> > + goto out_unlock;
> > + }
> > +
> > + if ( !dst_d->argo )
> > + {
> > + argo_dprintk("!dst_d->argo, ECONNREFUSED\n");
> > + ret = -ECONNREFUSED;
> > + put_domain(dst_d);
> > + goto out_unlock;
> > + }
> > +
> > + send_info = xzalloc(struct argo_send_info);
> > + if ( !send_info )
> > + {
> > + ret = -ENOMEM;
> > + put_domain(dst_d);
> > + goto out_unlock;
> > + }
> > + send_info->id = ring_id;
> > + }
>
> ... here, what exactly is it that requires the global read lock
> to be held ...
>
> > + write_lock(&currd->argo->lock);
>
> ... prior to this? Holding locks around allocations is not
> forbidden, but should be avoided whenever possible.
>
> And then further why does the global read lock need
> continued holding until the end of the function?
I've added FIXME to review this tomorrow. I understand the
argo-internal locking protocols and this is adhering to what they
state, in that accesses to the argo structs of currd and dst_d are
protected by the global read lock here, but at the moment I'm less
clear on what the expectations are for standard Xen domain locks,
references and lifecycle are.
>
> > + if ( currd->argo->ring_count >= MAX_RINGS_PER_DOMAIN )
> > + {
> > + ret = -ENOSPC;
> > + goto out_unlock2;
> > + }
> > +
> > + ring_info = ring_find_info(currd, &ring_id);
> > + if ( !ring_info )
> > + {
> > + ring_info = xzalloc(struct argo_ring_info);
> > + if ( !ring_info )
> > + {
> > + ret = -ENOMEM;
> > + goto out_unlock2;
> > + }
> > +
> > + spin_lock_init(&ring_info->lock);
> > +
> > + ring_info->id = ring_id;
> > + INIT_HLIST_HEAD(&ring_info->pending);
> > +
> > + hlist_add_head(&ring_info->node,
> > +
> > &currd->argo->ring_hash[hash_index(&ring_info->id)]);
> > +
> > + gprintk(XENLOG_DEBUG, "argo: vm%u registering ring (vm%u:%x
> > vm%d)\n",
> > + currd->domain_id, ring_id.domain_id, ring_id.port,
> > + ring_id.partner_id);
> > + }
> > + else
> > + {
> > + if ( ring_info->len )
> > + {
>
> Please fold into "else if ( )", removing a level of indentation.
ack, done
>
> > + /*
> > + * If the caller specified that the ring must not already
> > exist,
> > + * fail at attempt to add a completed ring which already
> > exists.
> > + */
> > + if ( fail_exist )
> > + {
> > + argo_dprintk("disallowed reregistration of existing
> > ring\n");
> > + ret = -EEXIST;
> > + goto out_unlock2;
> > + }
> > +
> > + if ( ring_info->len != reg.len )
> > + {
> > + /*
> > + * Change of ring size could result in entries on the
> > pending
> > + * notifications list that will never trigger.
> > + * Simple blunt solution: disallow ring resize for now.
> > + * TODO: investigate enabling ring resize.
> > + */
> > + gprintk(XENLOG_ERR,
> > + "argo: vm%u attempted to change ring size(vm%u:%x
> > vm%d)\n",
> > + currd->domain_id, ring_id.domain_id, ring_id.port,
> > + ring_id.partner_id);
> > + /*
> > + * Could return EINVAL here, but if the ring didn't already
> > + * exist then the arguments would have been valid, so:
> > EEXIST.
> > + */
> > + ret = -EEXIST;
> > + goto out_unlock2;
> > + }
> > +
> > + gprintk(XENLOG_DEBUG,
> > + "argo: vm%u re-registering existing ring (vm%u:%x
> > vm%d)\n",
> > + currd->domain_id, ring_id.domain_id, ring_id.port,
> > + ring_id.partner_id);
> > + }
> > + }
> > +
> > + ret = find_ring_mfns(currd, ring_info, npage, pg_descr_hnd, reg.len);
> > + if ( ret )
> > + {
> > + gprintk(XENLOG_ERR,
> > + "argo: vm%u failed to find ring mfns (vm%u:%x vm%d)\n",
> > + currd->domain_id, ring_id.domain_id, ring_id.port,
> > + ring_id.partner_id);
> > +
> > + ring_remove_info(currd, ring_info);
> > + goto out_unlock2;
> > + }
> > +
> > + /*
> > + * The first page of the memory supplied for the ring has the
> > xen_argo_ring
> > + * structure at its head, which is where the ring indexes reside.
> > + */
> > + ret = ring_map_page(ring_info, 0, &map_ringp);
> > + if ( ret )
> > + {
> > + gprintk(XENLOG_ERR,
> > + "argo: vm%u failed to map ring mfn 0 (vm%u:%x vm%d)\n",
> > + currd->domain_id, ring_id.domain_id, ring_id.port,
> > + ring_id.partner_id);
> > +
> > + ring_remove_info(currd, ring_info);
> > + goto out_unlock2;
> > + }
> > + ringp = map_ringp;
> > +
> > + private_tx_ptr = read_atomic(&ringp->tx_ptr);
> > +
> > + if ( (private_tx_ptr >= reg.len) ||
> > + (ROUNDUP_MESSAGE(private_tx_ptr) != private_tx_ptr) )
> > + {
> > + /*
> > + * Since the ring is a mess, attempt to flush the contents of it
> > + * here by setting the tx_ptr to the next aligned message slot past
> > + * the latest rx_ptr we have observed. Handle ring wrap correctly.
> > + */
> > + private_tx_ptr = ROUNDUP_MESSAGE(read_atomic(&ringp->rx_ptr));
> > +
> > + if ( private_tx_ptr >= reg.len )
> > + private_tx_ptr = 0;
> > +
> > + update_tx_ptr(ring_info, private_tx_ptr);
> > + }
> > +
> > + ring_info->tx_ptr = private_tx_ptr;
> > + ring_info->len = reg.len;
> > + currd->argo->ring_count++;
> > +
> > + if ( send_info )
> > + {
> > + spin_lock(&dst_d->argo->send_lock);
> > +
> > + hlist_add_head(&send_info->node,
> > +
> > &dst_d->argo->send_hash[hash_index(&send_info->id)]);
> > +
> > + spin_unlock(&dst_d->argo->send_lock);
> > + }
> > +
> > + out_unlock2:
> > + if ( !ret && send_info )
> > + xfree(send_info);
> > +
> > + if ( dst_d )
> > + put_domain(dst_d);
> > +
> > + write_unlock(&currd->argo->lock);
>
> Surely you can drop the lock before the other two cleanup
> actions? That would then allow you to add another label to
> absorb the two separate put_domain() calls on error paths.
That looks correct. Added a FIXME note now and will fix tomorrow. thanks.
>
> > --- a/xen/include/asm-arm/guest_access.h
> > +++ b/xen/include/asm-arm/guest_access.h
> > @@ -29,6 +29,8 @@ int access_guest_memory_by_ipa(struct domain *d, paddr_t
> > ipa, void *buf,
> > /* Is the guest handle a NULL reference? */
> > #define guest_handle_is_null(hnd) ((hnd).p == NULL)
> >
> > +#define guest_handle_is_aligned(hnd, mask) (!((uintptr_t)(hnd).p & (mask)))
>
> This is unused throughout the patch afaics.
Removed.
>
> > --- a/xen/include/public/argo.h
> > +++ b/xen/include/public/argo.h
> > @@ -31,6 +31,26 @@
> >
> > #include "xen.h"
> >
> > +#define XEN_ARGO_DOMID_ANY DOMID_INVALID
> > +
> > +/*
> > + * The maximum size of an Argo ring is defined to be: 16GB
> > + * -- which is 0x1000000 bytes.
> > + * A byte index into the ring is at most 24 bits.
> > + */
> > +#define XEN_ARGO_MAX_RING_SIZE (0x1000000ULL)
> > +
> > +/*
> > + * Page descriptor: encoding both page address and size in a 64-bit value.
> > + * Intended to allow ABI to support use of different granularity pages.
> > + * example of how to populate:
> > + * xen_argo_page_descr_t pg_desc =
> > + * (physaddr & PAGE_MASK) | XEN_ARGO_PAGE_DESCR_SIZE_4K;
> > + */
> > +typedef uint64_t xen_argo_page_descr_t;
> > +#define XEN_ARGO_PAGE_DESCR_SIZE_MASK 0x0000000000000fffULL
> > +#define XEN_ARGO_PAGE_DESCR_SIZE_4K 0
>
> Are the _DESCR_ infixes here really useful?
These are now gone, with Julien's approval for the change back to use
the gfn-using interfaces.
>
> > @@ -56,4 +76,56 @@ typedef struct xen_argo_ring
> > #endif
> > } xen_argo_ring_t;
> >
> > +typedef struct xen_argo_register_ring
> > +{
> > + uint32_t port;
> > + domid_t partner_id;
> > + uint16_t pad;
> > + uint32_t len;
> > +} xen_argo_register_ring_t;
> > +
> > +/* Messages on the ring are padded to a multiple of this size. */
> > +#define XEN_ARGO_MSG_SLOT_SIZE 0x10
> > +
> > +struct xen_argo_ring_message_header
> > +{
> > + uint32_t len;
> > + xen_argo_addr_t source;
> > + uint32_t message_type;
> > +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> > + uint8_t data[];
> > +#elif defined(__GNUC__)
> > + uint8_t data[0];
> > +#endif
> > +};
> > +
> > +/*
> > + * Hypercall operations
> > + */
> > +
> > +/*
> > + * XEN_ARGO_OP_register_ring
> > + *
> > + * Register a ring using the indicated memory.
> > + * Also used to reregister an existing ring (eg. after resume from
> > hibernate).
> > + *
> > + * arg1: XEN_GUEST_HANDLE(xen_argo_register_ring_t)
> > + * arg2: XEN_GUEST_HANDLE(xen_argo_page_descr_t)
> > + * arg3: unsigned long npages
> > + * arg4: unsigned long flags
>
> The "unsigned long"-s here are not necessarily compatible with
> compat mode. At the very least flags above bit 31 won't be
> usable by compat mode guests. Hence I also question ...
>
> > + */
> > +#define XEN_ARGO_OP_register_ring 1
> > +
> > +/* Register op flags */
> > +/*
> > + * Fail exist:
> > + * If set, reject attempts to (re)register an existing established ring.
> > + * If clear, reregistration occurs if the ring exists, with the new ring
> > + * taking the place of the old, preserving tx_ptr if it remains valid.
> > + */
> > +#define XEN_ARGO_REGISTER_FLAG_FAIL_EXIST 0x1
> > +
> > +/* Mask for all defined flags. unsigned long type so ok for both 32/64-bit
> > */
> > +#define XEN_ARGO_REGISTER_FLAG_MASK 0x1UL
>
> ... the UL suffix here. Also this last item should not be exposed
> (perhaps framed by "#ifdef __XEN__") and would perhaps anyway
> better be defined in terms of the other
> XEN_ARGO_REGISTER_FLAG_*.
Notes added in place for each of the above; will work on these tomorrow.
thanks
Christopher
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |