[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
> -----Original Message----- [snip] > > + > > +#include <xen/save.h> > > + > > +struct domain_context { > > + bool log; > > + struct domain_save_descriptor desc; > > + domain_copy_entry copy; > > As your new framework is technically an extension of existing one, it > would be good to explain why we diverge in the definitions. > I don't follow. What is diverging? I explain in the commit comment that this is a parallel framework. Do I need to justify why it is not a carbon copy of the HVM one? > > + void *priv; > > +}; > > + > > +static struct { > > + const char *name; > > + bool per_vcpu; > > + domain_save_handler save; > > + domain_load_handler load; > > +} handlers[DOMAIN_SAVE_CODE_MAX + 1]; > > + > > +void __init domain_register_save_type(unsigned int tc, const char *name, > > + bool per_vcpu, > > + domain_save_handler save, > > + domain_load_handler load) > > +{ > > + BUG_ON(tc > ARRAY_SIZE(handlers)); > > + > > + ASSERT(!handlers[tc].save); > > + ASSERT(!handlers[tc].load); > > + > > + handlers[tc].name = name; > > + handlers[tc].per_vcpu = per_vcpu; > > + handlers[tc].save = save; > > + handlers[tc].load = load; > > +} > > + > > +int domain_save_entry(struct domain_context *c, unsigned int tc, > > + const char *name, const struct vcpu *v, void *src, > > I realize that 'src' is not const because how you define copy, however I > would rather prefer if we rework the interface to avoid to keep the > const in the definition. This may mean to have to define two callback > rather than one. That seems a bit ugly for the sake of a const but I guess I could create a union with a copy_in and copy_out. I'll see how that looks. > > > + size_t src_len) > > On 64-bit architecture, size_t would be 64-bit. But the record is only > storing 32-bit. Would it make sense to add some sanity check in the code? > True. Given this is new I think I'll just bump the length to 64-bits. > > +{ > > + int rc; > > + > > + if ( c->log && tc != DOMAIN_SAVE_CODE(HEADER) && > > + tc != DOMAIN_SAVE_CODE(END) ) > > + gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name, src_len); > > + > > + if ( !IS_ALIGNED(src_len, 8) ) > > Why not adding an implicit padding? This would avoid to chase error > during saving because the len wasn't a multiple of 8. > I don't think implicit padding is worth it. This error should only happen if a badly defined save record type is added to the code so perhaps I ought to add an ASSERT here as well as deal with the error. > > + return -EINVAL; > > + > > + BUG_ON(tc != c->desc.typecode); > > + BUG_ON(v->vcpu_id != c->desc.instance); > > Does the descriptor really need to be filled by domain_save()? Can't it > be done here, so we can avoid the two BUG_ON()s? > Yes it can but this serves as a sanity check that the save code is actually doing what it should. Hence why these are BUG_ON()s and not error exits. > > + c->desc.length = src_len; > > + > > + rc = c->copy(c->priv, &c->desc, sizeof(c->desc)); > > + if ( rc ) > > + return rc; > > + > > + return c->copy(c->priv, src, src_len); > > +} > > + > > +int domain_save(struct domain *d, domain_copy_entry copy, void *priv, > > + unsigned long mask, bool dry_run) > > +{ > > + struct domain_context c = { > > + .copy = copy, > > + .priv = priv, > > + .log = !dry_run, > > + }; > > + struct domain_save_header h = { > > + .magic = DOMAIN_SAVE_MAGIC, > > + .version = DOMAIN_SAVE_VERSION, > > + }; > > + struct domain_save_header e; > > + unsigned int i; > > + int rc; > > + > > + ASSERT(d != current->domain); > > + > > + if ( d->is_dying ) > > + return -EINVAL; > > + > > + domain_pause(d); > > + > > + c.desc.typecode = DOMAIN_SAVE_CODE(HEADER); > > + > > + rc = DOMAIN_SAVE_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h)); > > + if ( rc ) > > + goto out; > > + > > + for ( i = 0; i < ARRAY_SIZE(handlers); i++ ) > > AFAIU, with this solution, if there are dependency between records, the > dependencies will have to a lower "index". I think we may have some > dependency with guest transparent migration such as we need to restore > the event ABI before restoring the event channels. > Is that just a downstream implementation detail though? I would hope that there are no ordering dependencies between records. > Should we rely on the index for the dependency? > If we do need ordering dependencies then I guess it would need to be explicit. > In any case, I think we want to document it. > I can certainly document that save handlers are invoked in code order. > > + { > > + domain_save_handler save = handlers[i].save; > > + > > + if ( (mask && !test_bit(i, &mask)) || !save ) > > The type of mask suggests it is not possible to have more than 32 > different types of record if we wanted to be arch agnostic. Even if we > focus on 64-bit arch, this is only 64 records. > > This is not very future proof, but might be ok if this is not exposed > outside of the hypervisor (I haven't looked at the rest of the series > yet). However, we at least need a BUILD_BUG_ON() in place. So please > make sure DOMAIN_SAVE_CODE_MAX is always less than 64. > > Also what if there is a bit set in the mask and the record is not > existing? Shouldn't we return an error? > TBH I think 32 will be enough... I would not expect this context space to grow very much, if at all, once transparent migration is working. I think I'll just drop the mask for now; it's not actually clear to me we'll make use of it... just seemed like a nice-to-have. > > + continue; > > + > > + memset(&c.desc, 0, sizeof(c.desc)); > > + c.desc.typecode = i; > > + > > + if ( handlers[i].per_vcpu ) > > + { > > + struct vcpu *v; > > + > > + for_each_vcpu ( d, v ) > > + { > > + c.desc.instance = v->vcpu_id; > > + > > + rc = save(v, &c, dry_run); > > + if ( rc ) > > + goto out; > > + } > > + } > > + else > > + { > > + rc = save(d->vcpu[0], &c, dry_run); > > + if ( rc ) > > + goto out; > > + } > > + } > > + > > + memset(&c.desc, 0, sizeof(c.desc)); > > + c.desc.typecode = DOMAIN_SAVE_CODE(END); > > + > > + rc = DOMAIN_SAVE_ENTRY(END, &c, d->vcpu[0], &e, 0); > > + > > + out: > > + domain_unpause(d); > > + > > + return rc; > > +} > > + > > +int domain_load_entry(struct domain_context *c, unsigned int tc, > > + const char *name, const struct vcpu *v, void *dst, > > + size_t dst_len, bool exact) > > +{ > > + int rc; > > + > > + if ( c->log && tc != DOMAIN_SAVE_CODE(HEADER) && > > + tc != DOMAIN_SAVE_CODE(END) ) > > + gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name, dst_len); > > + > > + BUG_ON(tc != c->desc.typecode); > > + BUG_ON(v->vcpu_id != c->desc.instance); > > Is it really warrant to crash the host? What would happen if the values > were wrong? > It would mean the code is fairly seriously buggy in that the load handler is trying to load a record other than the type it registered for, or for a vcpu other than the one it was passed. > > + > > + if ( (exact ? > > + (dst_len != c->desc.length) : (dst_len < c->desc.length)) || > > Using ternary in if is really confusing. How about: > > dst_len < c->desc.length || (exact && dst_len != c->desc.length) || > > I understand that there would be two check for the exact case but I > think it is better than a ternary. I'm going to re-work this I think. > > However what is the purpose of the param 'exact'? If you set it to false > how do you know the size you read? The point of the exact parameter is that whether the caller can only consume a record of the correct length, or whether it can handle an undersize one which gets zero-padded. (It's the same as the zeroextend option in HVM records). > > > + !IS_ALIGNED(c->desc.length, 8) ) > > + return -EINVAL; > > + > > + rc = c->copy(c->priv, dst, c->desc.length); > > + if ( rc ) > > + return rc; > > + > > + if ( !exact ) > > + { > > + dst += c->desc.length; > > + memset(dst, 0, dst_len - c->desc.length); > > + } > > + > > + return 0; > > +} > > + > > +int domain_load(struct domain *d, domain_copy_entry copy, void *priv, > > + unsigned long mask) > > +{ > > + struct domain_context c = { > > + .copy = copy, > > + .priv = priv, > > + .log = true, > > + }; > > + struct domain_save_header h, e; > > + int rc; > > + > > + ASSERT(d != current->domain); > > + > > + if ( d->is_dying ) > > + return -EINVAL; > > What does protect you against the doing dying right after this check? > Nothing. It's just to avoid doing pointless work if we can. > > + > > + rc = c.copy(c.priv, &c.desc, sizeof(c.desc)); > > + if ( rc ) > > + return rc; > > + > > + if ( c.desc.typecode != DOMAIN_SAVE_CODE(HEADER) || > > + c.desc.instance != 0 ) > > + return -EINVAL; > > + > > + rc = DOMAIN_LOAD_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h), true); > > + if ( rc ) > > + return rc; > > + > > + if ( h.magic != DOMAIN_SAVE_MAGIC || h.version != DOMAIN_SAVE_VERSION ) > > + return -EINVAL; > > + > > + domain_pause(d); > > + > > + for (;;) > > + { > > + unsigned int i; > > + domain_load_handler load; > > + struct vcpu *v; > > + > > + rc = c.copy(c.priv, &c.desc, sizeof(c.desc)); > > + if ( rc ) > > + break; > > + > > + if ( c.desc.typecode == DOMAIN_SAVE_CODE(END) ) { > > + rc = DOMAIN_LOAD_ENTRY(END, &c, d->vcpu[0], &e, 0, true); > > + break; > > + } > > + > > + rc = -EINVAL; > > + if ( c.desc.typecode >= ARRAY_SIZE(handlers) || > > + c.desc.instance >= d->max_vcpus ) > > Nothing in the documention suggests that c.desc.instance should be 0 > when the record is for the domain. > Ok. I'll put a comment somewhere. > > + break; > > + > > + i = c.desc.typecode; > > + load = handlers[i].load; > > + v = d->vcpu[c.desc.instance]; > > + > > + if ( mask && !test_bit(i, &mask) ) > > + { > > + /* Sink the data */ > > + rc = c.copy(c.priv, NULL, c.desc.length); > > + if ( rc ) > > + break; > > + > > + continue; > > + } > > + > > + rc = load ? load(v, &c) : -EOPNOTSUPP; > > + if ( rc ) > > + break; > > + } > > + > > + domain_unpause(d); > > + > > + return rc; > > +} > > + > > +/* > > + * Local variables: > > + * mode: C > > + * c-file-style: "BSD" > > + * c-basic-offset: 4 > > + * tab-width: 4 > > + * indent-tabs-mode: nil > > + * End: > > + */ > > diff --git a/xen/include/public/save.h b/xen/include/public/save.h > > new file mode 100644 > > index 0000000000..84981cd0f6 > > --- /dev/null > > +++ b/xen/include/public/save.h > > @@ -0,0 +1,74 @@ > > +/* > > + * save.h > > + * > > + * Structure definitions for common PV/HVM domain state that is held by > > + * Xen and must be saved along with the domain's memory. > > + * > > + * Copyright Amazon.com Inc. or its affiliates. > > + * > > + * Permission is hereby granted, free of charge, to any person obtaining a > > copy > > + * of this software and associated documentation files (the "Software"), to > > + * deal in the Software without restriction, including without limitation > > the > > + * rights to use, copy, modify, merge, publish, distribute, sublicense, > > and/or > > + * sell copies of the Software, and to permit persons to whom the Software > > is > > + * furnished to do so, subject to the following conditions: > > + * > > + * The above copyright notice and this permission notice shall be included > > in > > + * all copies or substantial portions of the Software. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS > > OR > > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > > THE > > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > > + * DEALINGS IN THE SOFTWARE. > > + */ > > + > > +#ifndef __XEN_PUBLIC_SAVE_H__ > > +#define __XEN_PUBLIC_SAVE_H__ > > + > > +#include "xen.h" > > + > > +/* Each entry is preceded by a descriptor */ > > +struct domain_save_descriptor { > > + uint16_t typecode; > > + uint16_t instance; > > + /* > > + * Entry length not including this descriptor. Entries must be padded > > + * to a multiple of 8 bytes to make sure descriptors remain correctly > > + * aligned. > > + */ > > + uint32_t length; > > +}; > > + > > +/* > > + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE > > + * binds these things together. > > + */ > > +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \ > > + struct __DOMAIN_SAVE_TYPE_##_x { _type t; char c[_code]; }; > > + > > +#define DOMAIN_SAVE_TYPE(_x) \ > > + typeof (((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->t) > > +#define DOMAIN_SAVE_CODE(_x) \ > > + (sizeof (((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->c)) > > +#define DOMAIN_SAVE_MASK(_x) (1u << DOMAIN_SAVE_CODE(_x)) > > + > > +/* Terminating entry */ > > +struct domain_save_end {}; > > +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end); > > + > > +#define DOMAIN_SAVE_MAGIC 0x53415645 > > +#define DOMAIN_SAVE_VERSION 0x00000001 > > + > > +/* Initial entry */ > > +struct domain_save_header { > > + uint32_t magic; /* Must be DOMAIN_SAVE_MAGIC */ > > + uint32_t version; /* Save format version */ > > Would it make sense to have the version of Xen in the stream? > Why? How would it help? Paul
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |