[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 04/14] argo: init, destroy and soft-reset, with enable command line opt



On Tue, Jan 15, 2019 at 4:29 AM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
>
> On Tue, Jan 15, 2019 at 01:27:36AM -0800, Christopher Clark wrote:
> > Initialises basic data structures and performs teardown of argo state
> > for domain shutdown.

> > +
> > +/*
> > + * The value of the argo element in a struct domain is
> > + * protected by L1_global_argo_rwlock
> > + */
> > +#define ARGO_HTABLE_SIZE 32
> > +struct argo_domain
> > +{
> > +    /* rings_L2 */
> > +    rwlock_t rings_L2_rwlock;
> > +    /*
> > +     * Hash table of argo_ring_info about rings this domain has registered.
> > +     * Protected by rings_L2.
> > +     */
> > +    struct hlist_head ring_hash[ARGO_HTABLE_SIZE];
> > +    /* Counter of rings registered by this domain. Protected by rings_L2. 
> > */
> > +    unsigned int ring_count;
> > +
> > +    /* send_L2 */
> > +    spinlock_t send_L2_lock;
>
> Other locks are rw locks, while this is a spinlock, I guess that's
> because there aren't many concurrent read-only accesses to
> send_hash?

Yes, that's correct. The only places that need to take this lock need
to take it exclusively, for updating the protected data structure, so
there's no call for or benefit to using a rw lock for this one.

>
> > +    /*
> > +     * Hash table of argo_send_info about rings other domains have 
> > registered
> > +     * for this domain to send to. Single partner, non-wildcard rings.
> > +     * Protected by send_L2.
> > +     */
> > +    struct hlist_head send_hash[ARGO_HTABLE_SIZE];
> > +
> > +    /* wildcard_L2 */
> > +    spinlock_t wildcard_L2_lock;
> > +    /*
> > +     * List of pending space-available signals for this domain about 
> > wildcard
> > +     * rings registered by other domains. Protected by wildcard_L2.
> > +     */
> > +    struct hlist_head wildcard_pend_list;
> > +};
> > +
> > +/*
> > + * Locking is organized as follows:
> > + *
> > + * Terminology: R(<lock>) means taking a read lock on the specified lock;
> > + *              W(<lock>) means taking a write lock on it.
> > + *
> > + * == L1 : The global read/write lock: L1_global_argo_rwlock
> > + * Protects the argo elements of all struct domain *d in the system.
> > + * It does not protect any of the elements of d->argo, only their
> > + * addresses.
>
> But if you W(L1), you can basically modify anything, in all d->argo
> structs, so it does seem to protect the elements of d->argo when
> write-locked?

ack, that is correct and this comment isn't clear enough about what it
is supposed to say, so I've just rewritten it. Pasting the new version
of the comment about the L1 lock here:

 * == L1 : The global read/write lock: L1_global_argo_rwlock
 * Protects the argo elements of all struct domain *d in the system.
 *
 * R(L1) does not protect any of the elements of d->argo; it protects their
 * addresses. W(L1) protects those and more since it implies W on all the lower
 * level locks - see the notes on those locks below.
 *
 * The destruction of an argo-enabled domain, which must have a non-NULL d->argo
 * pointer, will need to free that d->argo pointer, which requires W(L1).
 * Since holding R(L1) will block acquiring W(L1), it will ensure that
 * no domains pointers that argo is interested in become invalid while either
 * W(L1) or R(L1) are held.

>
> > + * By extension since the destruction of a domain with a non-NULL
> > + * d->argo will need to free the d->argo pointer, holding W(L1)
> > + * guarantees that no domains pointers that argo is interested in
> > + * become invalid whilst this lock is held.
>
> AFAICT holding W(L1) guarantees not only that pointers doesn't change,
> but that there are no changes at all in any of the d->argo contained
> data.

Ack. In the terminology used in the comments: W(L1) implies
W(rings_L2), W(send_L2) and W(wildcard_L2) on all domains, and implies
L3 on all rings. ie. holding W(L1) grants the same access as if you
held all of those lower level locks, which is effectively to all Argo
data structures.

> > +/*
> > + * Lock state validations macros
> > + *
> > + * These macros encode the logic to verify that the locking has adhered to 
> > the
> > + * locking discipline above.
> > + * eg. On entry to logic that requires holding at least R(rings_L2), this:
> > + *      ASSERT(LOCKING_Read_rings_L2(d));
> > + *
> > + * checks that the lock state is sufficient, validating that one of the
> > + * following must be true when executed:       R(rings_L2) && R(L1)
> > + *                                        or:  W(rings_L2) && R(L1)
> > + *                                        or:  W(L1)
> > + */
> > +
> > +/* RAW macros here are only used to assist defining the other macros below 
> > */
> > +#define RAW_LOCKING_Read_L1 (rw_is_locked(&L1_global_argo_rwlock))
>
> Not sure whether it's relevant or not, but this macro would return
> true as long as the lock is taken, regardless of whether it's read or
> write locked. If you want to make sure it's only read-locked then you
> will have to use:
>
> rw_is_locked(&L1_global_argo_rwlock) &&
> !rw_is_write_locked(&L1_global_argo_rwlock)
>
> AFAICT.

Thanks - you're right, and in practice the macros don't need that
distinction about only-read locking, which is helpful...

> > +#define RAW_LOCKING_Read_rings_L2(d) \
> > +    (rw_is_locked(&d->argo->rings_L2_rwlock) && RAW_LOCKING_Read_L1)
> > +
> > +/* The LOCKING macros defined below here are for use at verification 
> > points */
> > +#define LOCKING_Write_L1 (rw_is_write_locked(&L1_global_argo_rwlock))
> > +#define LOCKING_Read_L1 (RAW_LOCKING_Read_L1 || LOCKING_Write_L1)
>
> You can drop the LOCKING_Write_L1 here, since with the current macros
> RAW_LOCKING_Read_L1 will return true regardless of whether the lock is
> read or write locked.
>
> > +
> > +#define LOCKING_Write_rings_L2(d) \
> > +    ((RAW_LOCKING_Read_L1 && 
> > rw_is_write_locked(&d->argo->rings_L2_rwlock)) || \
>
> For safety you need parentheses around d here:
>
> rw_is_write_locked(&(d)->argo->rings_L2_rwlock)
>
> And also in the macros below, same applies to r.

Ack to all the above and so I've rewritten the macros -- and handily,
the RAW ones are just not needed. New versions (hopefully my mail
client doesn't shred the pasting here; have reflowed to shorten lines
a bit):

 * The LOCKING macros defined below here are for use at verification points.
 */
#define LOCKING_Write_L1 (rw_is_write_locked(&L1_global_argo_rwlock))
/*
 * While LOCKING_Read_L1 will return true even if the lock is write-locked,
 * that's OK because everywhere that a Read lock is needed with these
 * macros, holding a Write lock there instead is OK too: we're checking that
 * _at least_ the specified level of locks are held.
 */
#define LOCKING_Read_L1 (rw_is_locked(&L1_global_argo_rwlock))

#define LOCKING_Write_rings_L2(d) \
    ((LOCKING_Read_L1 && \
        rw_is_write_locked(&(d)->argo->rings_L2_rwlock)) || \
     LOCKING_Write_L1)
/*
 * Skip checking LOCKING_Write_rings_L2(d) within this
LOCKING_Read_rings_L2 * definition because the first clause that is
testing R(L1) && R(L2) will
 * also return true if R(L1) && W(L2) is true, because of the way that
 * rw_is_locked behaves. This results in a slightly shorter and faster
 * implementation.
 */
#define LOCKING_Read_rings_L2(d) \
    ((LOCKING_Read_L1 && rw_is_locked(&(d)->argo->rings_L2_rwlock)) || \
     LOCKING_Write_L1)
/*
 * Skip checking LOCKING_Write_L1 within this LOCKING_L3 definition because
 * LOCKING_Write_rings_L2(d) will return true for that condition.
 */
#define LOCKING_L3(d, r) \
    ((LOCKING_Read_L1 && rw_is_locked(&(d)->argo->rings_L2_rwlock) \
      && spin_is_locked(&(r)->L3_lock)) || LOCKING_Write_rings_L2(d))

#define LOCKING_send_L2(d) \
    ((LOCKING_Read_L1 && spin_is_locked(&(d)->argo->send_L2_lock)) || \
     LOCKING_Write_L1)

> >
> > +/*
> > + * FIXME for 4.12:
> > + *  * Replace this hash function to get better distribution across buckets.
> > + *  * Don't use casts in the replacement function.
> > + *  * Drop the use of array_index_nospec.
> > + */
> > +/*
> > + * This hash function is used to distribute rings within the per-domain
> > + * hash tables (d->argo->ring_hash and d->argo_send_hash). The hash table
> > + * will provide a struct if a match is found with a 'argo_ring_id' key:
> > + * ie. the key is a (domain id, argo port, partner domain id) tuple.
> > + * Since argo port number varies the most in expected use, and the Linux 
> > driver
> > + * allocates at both the high and low ends, incorporate high and low bits 
> > to
> > + * help with distribution.
> > + * Apply array_index_nospec as a defensive measure since this operates
> > + * on user-supplied input and the array size that it indexes into is known.
> > + */
> > +static unsigned int
> > +hash_index(const struct argo_ring_id *id)
> > +{
> > +    unsigned int hash;
> > +
> > +    hash = (uint16_t)(id->aport >> 16);
> > +    hash ^= (uint16_t)id->aport;
> > +    hash ^= id->domain_id;
> > +    hash ^= id->partner_id;
> > +    hash &= (ARGO_HTABLE_SIZE - 1);
> > +
> > +    return array_index_nospec(hash, ARGO_HTABLE_SIZE);
> > +}
> > +
> > +static struct argo_ring_info *
> > +find_ring_info(const struct domain *d, const struct argo_ring_id *id)
> > +{
> > +    unsigned int ring_hash_index;
> > +    struct hlist_node *node;
> > +    struct argo_ring_info *ring_info;
> > +
> > +    ASSERT(LOCKING_Read_rings_L2(d));
> > +
> > +    ring_hash_index = hash_index(id);
> > +
> > +    argo_dprintk("d->argo=%p, d->argo->ring_hash[%u]=%p id=%p\n",
> > +                 d->argo, ring_hash_index,
> > +                 d->argo->ring_hash[ring_hash_index].first, id);
> > +    argo_dprintk("id.aport=%x id.domain=vm%u id.partner_id=vm%u\n",
> > +                 id->aport, id->domain_id, id->partner_id);
> > +
> > +    hlist_for_each_entry(ring_info, node, 
> > &d->argo->ring_hash[ring_hash_index],
> > +                         node)
> > +    {
> > +        const struct argo_ring_id *cmpid = &ring_info->id;
> > +
> > +        if ( cmpid->aport == id->aport &&
> > +             cmpid->domain_id == id->domain_id &&
> > +             cmpid->partner_id == id->partner_id )
> > +        {
> > +            argo_dprintk("ring_info=%p\n", ring_info);
> > +            return ring_info;
> > +        }
> > +    }
> > +    argo_dprintk("no ring_info found\n");
> > +
> > +    return NULL;
> > +}
> > +
> > +static void
> > +ring_unmap(const struct domain *d, struct argo_ring_info *ring_info)
> > +{
> > +    unsigned int i;
> > +
> > +    ASSERT(LOCKING_L3(d, ring_info));
> > +
> > +    if ( !ring_info->mfn_mapping )
> > +        return;
> > +
> > +    for ( i = 0; i < ring_info->nmfns; i++ )
> > +    {
> > +        if ( !ring_info->mfn_mapping[i] )
> > +            continue;
> > +        if ( ring_info->mfns )
> > +            argo_dprintk(XENLOG_ERR "argo: unmapping page %"PRI_mfn" from 
> > %p\n",
> > +                         mfn_x(ring_info->mfns[i]),
> > +                         ring_info->mfn_mapping[i]);
>
> Is it actually possible to have a mapped page without a matching mfn
> stored in the mfns array? That would imply there's no reference
> taken on such mapped page, which could be dangerous? I think you might
> want to add an ASSERT(ring_info->mfns) instead of the current if
> condition?
>
> (Maybe I'm missing something here).

I don't think you've missed anything - it looks right to me, so I've
dropped the if, made the printk unconditional and added two ASSERTs:
One is before the loop:
    ASSERT(!ring_info->nmfns || ring_info->mfns);

and another within the loop, after the continue:
        ASSERT(!mfn_eq(ring_info->mfns[i], INVALID_MFN));

>
> >  long
> >  do_argo_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg1,
> >             XEN_GUEST_HANDLE_PARAM(void) arg2, unsigned long arg3,
> >             unsigned long arg4)
> >  {
> > -    return -ENOSYS;
> > +    long rc = -EFAULT;
> > +
> > +    argo_dprintk("->do_argo_op(%u,%p,%p,%lu,0x%lx)\n", cmd,
> > +                 (void *)arg1.p, (void *)arg2.p, arg3, arg4);
> > +
> > +    if ( unlikely(!opt_argo_enabled) )
> > +        return -EOPNOTSUPP;
>
> I think this should return -ENOSYS, an hypervisor built with
> CONFIG_ARGO but without argo enabled on the command line shouldn't
> behave differently than an hypervisor build without CONFIG_ARGO.

I've left this unchanged as -EOPNOTSUPP after the later discussion in
this thread.

>
> > +
> > +    switch (cmd)
> > +    {
> > +    default:
> > +        rc = -EOPNOTSUPP;
> > +        break;
> > +    }
> > +
> > +    argo_dprintk("<-do_argo_op(%u)=%ld\n", cmd, rc);
> > +
> > +    return rc;
> > +}
> > +
> > +static void
> > +argo_domain_init(struct argo_domain *argo)
> > +{
> > +    unsigned int i;
> > +
> > +    rwlock_init(&argo->rings_L2_rwlock);
> > +    spin_lock_init(&argo->send_L2_lock);
> > +    spin_lock_init(&argo->wildcard_L2_lock);
> > +    argo->ring_count = 0;
>
> No need to set ring_count to 0, since you allocate the struct with
> xzalloc it's going to be zeroed already.
>
> In the argo_soft_reset case domain_rings_remove_all should have
> already set ring_count to 0.

ack, done, thanks

Christopher

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.