[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH 1/5] xen/common: introduce a new framework for save/restore of 'domain' context



> -----Original Message-----
[snip]
> > +
> > +#include <xen/save.h>
> > +
> > +struct domain_context {
> > +    bool log;
> > +    struct domain_save_descriptor desc;
> > +    domain_copy_entry copy;
> 
> As your new framework is technically an extension of existing one, it
> would be good to explain why we diverge in the definitions.
> 

I don't follow. What is diverging? I explain in the commit comment that this is 
a parallel framework. Do I need to justify why it is not a carbon copy of the 
HVM one?

> > +    void *priv;
> > +};
> > +
> > +static struct {
> > +    const char *name;
> > +    bool per_vcpu;
> > +    domain_save_handler save;
> > +    domain_load_handler load;
> > +} handlers[DOMAIN_SAVE_CODE_MAX + 1];
> > +
> > +void __init domain_register_save_type(unsigned int tc, const char *name,
> > +                                      bool per_vcpu,
> > +                                      domain_save_handler save,
> > +                                      domain_load_handler load)
> > +{
> > +    BUG_ON(tc > ARRAY_SIZE(handlers));
> > +
> > +    ASSERT(!handlers[tc].save);
> > +    ASSERT(!handlers[tc].load);
> > +
> > +    handlers[tc].name = name;
> > +    handlers[tc].per_vcpu = per_vcpu;
> > +    handlers[tc].save = save;
> > +    handlers[tc].load = load;
> > +}
> > +
> > +int domain_save_entry(struct domain_context *c, unsigned int tc,
> > +                      const char *name, const struct vcpu *v, void *src,
> 
> I realize that 'src' is not const because how you define copy, however I
> would rather prefer if we rework the interface to avoid to keep the
> const in the definition. This may mean to have to define two callback
> rather than one.

That seems a bit ugly for the sake of a const but I guess I could create a 
union with a copy_in and copy_out. I'll see how that looks.

> 
> > +                      size_t src_len)
> 
> On 64-bit architecture, size_t would be 64-bit. But the record is only
> storing 32-bit. Would it make sense to add some sanity check in the code?
> 

True. Given this is new I think I'll just bump the length to 64-bits.

> > +{
> > +    int rc;
> > +
> > +    if ( c->log && tc != DOMAIN_SAVE_CODE(HEADER) &&
> > +         tc != DOMAIN_SAVE_CODE(END) )
> > +        gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name, src_len);
> > +
> > +    if ( !IS_ALIGNED(src_len, 8) )
> 
> Why not adding an implicit padding? This would avoid to chase error
> during saving because the len wasn't a multiple of 8.
> 

I don't think implicit padding is worth it. This error should only happen if a 
badly defined save record type is added to the code so perhaps I ought to add 
an ASSERT here as well as deal with the error.

> > +        return -EINVAL;
> > +
> > +    BUG_ON(tc != c->desc.typecode);
> > +    BUG_ON(v->vcpu_id != c->desc.instance);
> 
> Does the descriptor really need to be filled by domain_save()? Can't it
> be done here, so we can avoid the two BUG_ON()s?
> 

Yes it can but this serves as a sanity check that the save code is actually 
doing what it should. Hence why these are BUG_ON()s and not error exits.

> > +    c->desc.length = src_len;
> > +
> > +    rc = c->copy(c->priv, &c->desc, sizeof(c->desc));
> > +    if ( rc )
> > +        return rc;
> > +
> > +    return c->copy(c->priv, src, src_len);
> > +}
> > +
> > +int domain_save(struct domain *d, domain_copy_entry copy, void *priv,
> > +                unsigned long mask, bool dry_run)
> > +{
> > +    struct domain_context c = {
> > +        .copy = copy,
> > +        .priv = priv,
> > +        .log = !dry_run,
> > +    };
> > +    struct domain_save_header h = {
> > +        .magic = DOMAIN_SAVE_MAGIC,
> > +        .version = DOMAIN_SAVE_VERSION,
> > +    };
> > +    struct domain_save_header e;
> > +    unsigned int i;
> > +    int rc;
> > +
> > +    ASSERT(d != current->domain);
> > +
> > +    if ( d->is_dying )
> > +        return -EINVAL;
> > +
> > +    domain_pause(d);
> > +
> > +    c.desc.typecode = DOMAIN_SAVE_CODE(HEADER);
> > +
> > +    rc = DOMAIN_SAVE_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h));
> > +    if ( rc )
> > +        goto out;
> > +
> > +    for ( i = 0; i < ARRAY_SIZE(handlers); i++ )
> 
> AFAIU, with this solution, if there are dependency between records, the
> dependencies will have to a lower "index". I think we may have some
> dependency with guest transparent migration such as we need to restore
> the event ABI before restoring the event channels.
> 

Is that just a downstream implementation detail though? I would hope that there 
are no ordering dependencies between records.

> Should we rely on the index for the dependency?
>

If we do need ordering dependencies then I guess it would need to be explicit.
 
> In any case, I think we want to document it.
>

I can certainly document that save handlers are invoked in code order.

> > +    {
> > +        domain_save_handler save = handlers[i].save;
> > +
> > +        if ( (mask && !test_bit(i, &mask)) || !save )
> 
> The type of mask suggests it is not possible to have more than 32
> different types of record if we wanted to be arch agnostic. Even if we
> focus on 64-bit arch, this is only 64 records.
> 
> This is not very future proof, but might be ok if this is not exposed
> outside of the hypervisor (I haven't looked at the rest of the series
> yet). However, we at least need a BUILD_BUG_ON() in place. So please
> make sure  DOMAIN_SAVE_CODE_MAX is always less than 64.
> 
> Also what if there is a bit set in the mask and the record is not
> existing? Shouldn't we return an error?
> 

TBH I think 32 will be enough... I would not expect this context space to grow 
very much, if at all, once transparent migration is working. I think I'll just 
drop the mask for now; it's not actually clear to me we'll make use of it... 
just seemed like a nice-to-have.

> > +            continue;
> > +
> > +        memset(&c.desc, 0, sizeof(c.desc));
> > +        c.desc.typecode = i;
> > +
> > +        if ( handlers[i].per_vcpu )
> > +        {
> > +            struct vcpu *v;
> > +
> > +            for_each_vcpu ( d, v )
> > +            {
> > +                c.desc.instance = v->vcpu_id;
> > +
> > +                rc = save(v, &c, dry_run);
> > +                if ( rc )
> > +                    goto out;
> > +            }
> > +        }
> > +        else
> > +        {
> > +            rc = save(d->vcpu[0], &c, dry_run);
> > +            if ( rc )
> > +                goto out;
> > +        }
> > +    }
> > +
> > +    memset(&c.desc, 0, sizeof(c.desc));
> > +    c.desc.typecode = DOMAIN_SAVE_CODE(END);
> > +
> > +    rc = DOMAIN_SAVE_ENTRY(END, &c, d->vcpu[0], &e, 0);
> > +
> > + out:
> > +    domain_unpause(d);
> > +
> > +    return rc;
> > +}
> > +
> > +int domain_load_entry(struct domain_context *c, unsigned int tc,
> > +                      const char *name, const struct vcpu *v, void *dst,
> > +                      size_t dst_len, bool exact)
> > +{
> > +    int rc;
> > +
> > +    if ( c->log && tc != DOMAIN_SAVE_CODE(HEADER) &&
> > +         tc != DOMAIN_SAVE_CODE(END) )
> > +        gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name, dst_len);
> > +
> > +    BUG_ON(tc != c->desc.typecode);
> > +    BUG_ON(v->vcpu_id != c->desc.instance);
> 
> Is it really warrant to crash the host? What would happen if the values
> were wrong?
> 

It would mean the code is fairly seriously buggy in that the load handler is 
trying to load a record other than the type it registered for, or for a vcpu 
other than the one it was passed.

> > +
> > +    if ( (exact ?
> > +          (dst_len != c->desc.length) : (dst_len < c->desc.length)) ||
> 
> Using ternary in if is really confusing. How about:
> 
> dst_len < c->desc.length || (exact && dst_len != c->desc.length) ||
> 
> I understand that there would be two check for the exact case but I
> think it is better than a ternary.

I'm going to re-work this I think.

> 
> However what is the purpose of the param 'exact'? If you set it to false
> how do you know the size you read?

The point of the exact parameter is that whether the caller can only consume a 
record of the correct length, or whether it can handle an undersize one which 
gets zero-padded. (It's the same as the zeroextend option in HVM records).

> 
> > +         !IS_ALIGNED(c->desc.length, 8) )
> > +        return -EINVAL;
> > +
> > +    rc = c->copy(c->priv, dst, c->desc.length);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( !exact )
> > +    {
> > +        dst += c->desc.length;
> > +        memset(dst, 0, dst_len - c->desc.length);
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +int domain_load(struct domain *d,  domain_copy_entry copy, void *priv,
> > +                unsigned long mask)
> > +{
> > +    struct domain_context c = {
> > +        .copy = copy,
> > +        .priv = priv,
> > +        .log = true,
> > +    };
> > +    struct domain_save_header h, e;
> > +    int rc;
> > +
> > +    ASSERT(d != current->domain);
> > +
> > +    if ( d->is_dying )
> > +        return -EINVAL;
> 
> What does protect you against the doing dying right after this check?
> 

Nothing. It's just to avoid doing pointless work if we can.

> > +
> > +    rc = c.copy(c.priv, &c.desc, sizeof(c.desc));
> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( c.desc.typecode != DOMAIN_SAVE_CODE(HEADER) ||
> > +         c.desc.instance != 0 )
> > +        return -EINVAL;
> > +
> > +    rc = DOMAIN_LOAD_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h), true);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( h.magic != DOMAIN_SAVE_MAGIC || h.version != DOMAIN_SAVE_VERSION )
> > +        return -EINVAL;
> > +
> > +    domain_pause(d);
> > +
> > +    for (;;)
> > +    {
> > +        unsigned int i;
> > +        domain_load_handler load;
> > +        struct vcpu *v;
> > +
> > +        rc = c.copy(c.priv, &c.desc, sizeof(c.desc));
> > +        if ( rc )
> > +            break;
> > +
> > +        if ( c.desc.typecode == DOMAIN_SAVE_CODE(END) ) {
> > +            rc = DOMAIN_LOAD_ENTRY(END, &c, d->vcpu[0], &e, 0, true);
> > +            break;
> > +        }
> > +
> > +        rc = -EINVAL;
> > +        if ( c.desc.typecode >= ARRAY_SIZE(handlers) ||
> > +             c.desc.instance >= d->max_vcpus )
> 
> Nothing in the documention suggests that c.desc.instance should be 0
> when the record is for the domain.
> 

Ok. I'll put a comment somewhere.

> > +            break;
> > +
> > +        i = c.desc.typecode;
> > +        load = handlers[i].load;
> > +        v = d->vcpu[c.desc.instance];
> > +
> > +        if ( mask && !test_bit(i, &mask) )
> > +        {
> > +            /* Sink the data */
> > +            rc = c.copy(c.priv, NULL, c.desc.length);
> > +            if ( rc )
> > +                break;
> > +
> > +            continue;
> > +        }
> > +
> > +        rc = load ? load(v, &c) : -EOPNOTSUPP;
> > +        if ( rc )
> > +            break;
> > +    }
> > +
> > +    domain_unpause(d);
> > +
> > +    return rc;
> > +}
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * tab-width: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> > new file mode 100644
> > index 0000000000..84981cd0f6
> > --- /dev/null
> > +++ b/xen/include/public/save.h
> > @@ -0,0 +1,74 @@
> > +/*
> > + * save.h
> > + *
> > + * Structure definitions for common PV/HVM domain state that is held by
> > + * Xen and must be saved along with the domain's memory.
> > + *
> > + * Copyright Amazon.com Inc. or its affiliates.
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a 
> > copy
> > + * of this software and associated documentation files (the "Software"), to
> > + * deal in the Software without restriction, including without limitation 
> > the
> > + * rights to use, copy, modify, merge, publish, distribute, sublicense, 
> > and/or
> > + * sell copies of the Software, and to permit persons to whom the Software 
> > is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included 
> > in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
> > OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
> > THE
> > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + */
> > +
> > +#ifndef __XEN_PUBLIC_SAVE_H__
> > +#define __XEN_PUBLIC_SAVE_H__
> > +
> > +#include "xen.h"
> > +
> > +/* Each entry is preceded by a descriptor */
> > +struct domain_save_descriptor {
> > +    uint16_t typecode;
> > +    uint16_t instance;
> > +    /*
> > +     * Entry length not including this descriptor. Entries must be padded
> > +     * to a multiple of 8 bytes to make sure descriptors remain correctly
> > +     * aligned.
> > +     */
> > +    uint32_t length;
> > +};
> > +
> > +/*
> > + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
> > + * binds these things together.
> > + */
> > +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
> > +    struct __DOMAIN_SAVE_TYPE_##_x { _type t; char c[_code]; };
> > +
> > +#define DOMAIN_SAVE_TYPE(_x) \
> > +    typeof (((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->t)
> > +#define DOMAIN_SAVE_CODE(_x) \
> > +    (sizeof (((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->c))
> > +#define DOMAIN_SAVE_MASK(_x) (1u << DOMAIN_SAVE_CODE(_x))
> > +
> > +/* Terminating entry */
> > +struct domain_save_end {};
> > +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
> > +
> > +#define DOMAIN_SAVE_MAGIC   0x53415645
> > +#define DOMAIN_SAVE_VERSION 0x00000001
> > +
> > +/* Initial entry */
> > +struct domain_save_header {
> > +    uint32_t magic;             /* Must be DOMAIN_SAVE_MAGIC */
> > +    uint32_t version;           /* Save format version */
> 
> Would it make sense to have the version of Xen in the stream?
> 

Why? How would it help?

  Paul




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.