[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] API/ABIs: Re: [RFC PATCH v2 0/2] Add a new acquire resource to query vcpu statistics
A discussion about forward extensible APIs and ABIs here. First, its important to say that this should be "domain stats" and not just "vcpu stats". This is easy to account for now, but hard to retrofit later. For the shared memory, we have a minimum granularity of PAGE_SIZE (4k for now, but this will change in due course on non-x86 architectures), and a resource-agnostic way of determining the resource size (as a multiple of the page size). But there are other things we need to consider: 1) To be sensibly extendable, there needs to be some metadata, and the domain stats is clearly going to be a different shape to the vcpu stats. 2) We also want to give Xen some flexibility to allocate memory in a suitable manner for the system. 3) Xen and the userspace consuming this interface will likely be built from different versions of the header. This needs to inter-operate with the common subset of functionality. So what we want, at least to describe the shape, is something like this: struct dom_shmem_stats { uint32_t dom_size; // sizeof(dom) uint32_t vcpu_offs; uint32_t vcpu_size; // sizeof(vcpu) uint32_t vcpu_stride; ... }; struct vcpu_shmem_stats { ... }; where the data layout shall be that there is one dom structure starting at 0, and an array of vcpu_stride objects starting at vcpu_offset. Obviously, some invariants apply. vcpu_offs >= dom_size, and vcpu_stride >= vcpu_size. The total size of the mapping must be larger than vcpu_offs + vcpus * vcpu_stride (and no, I intentionally don't care about the rounding at the end because it doesn't change things in practice.) A very simple layout, packing the data as closely as reasonable, might be: vcpu_offs = roundup(sizeof(dom), cacheline_size) vcpu_stride = roundup(sizeof(vcpu), cacheline_size); but Xen might have reason to use some other rounding. As the dom or vcpu size approaches a page, then Xen will want to change allocation scheme to use page size for both, and not vmap the lot as one consecutive region. For the stats data itself, there wants to be some indication of what data Xen is producing, so userspace can know not to bother consuming. So something like: struct $foo_stats { ... #define STATS_A (1u << 0) #define STATS_B (1u << 2) uint32_t stats_active; struct $foo_stats_a { uint32_t single_field; ... // maybe other singleton fields }; struct $foo_stats_b { uint32_t seq; // "seq" is more common than "version" uint32_t _pad; uint64_t field1; uint64_t field2; ... }; }; Forward compatibility rules say that you can only ever append new fields. But as hinted at with the stats_active field, its fine to leave reserved fields around for future use, generally with a rule that anything reserved shall be 0. Importantly, this means that offsetof(dom, stats_b) is fixed, and will inter-operate just fine if e.g. userspace knows about a stats_c that Xen doesn't know about. But this does highlight some more invariants. Xen must not produce any data outside of [0, vcpu_offs) for dom data, and [base, vcpu_stride) for vcpu data. Furthermore, Xen should not produce any data not indicated by the stats_active field. That said, if Xen is compiled knowing about dom->stats_c, and userspace is not, then userspace will observe Xen advertising a stat it doesn't know, and producing data beyond userspace's sizeof(dom), but within dom->vcpu_offs. This is explicitly fine and expected, and why Xen writes it's sizeof() information into the dom header. This allows both sides to agree on the layout even when they're not compiled from identical copies of the header. A few closing thoughts. 1) It is wise to have a magic number at the head of each dom and vcpu struct. This helps sanity check that both sides have correctly agreed on the layout, but can also serve double duty as an ABI "version". If we screw up spectacularly, and decide that the best course of action is to break backwards compatibility, then we can change the magic and edit the structs in a non-forwards-compatible way. 2) We may get to a point of wanting arch specific stats. This can be accommodated in the model by having struct arch_{dom,vcpu}_stats at positions described by the layout at the start of dom. It would be wise to leave space (reserved fields) there to use if necessary. This is cleaner than deciding that we need to put some new layout fields after the latest $foo_stats_$N and before $foo_stats_$M. 3) It would be great if we could have something in tools/tests/ which can attach to a running VM and assess the correctness of the invariants given. Better yet if it could compile for each change of the ABI and assess the correctness for all. I hope this all makes sense. I know its not trivial, but there's also nothing in here which is rocket science, and with a bit of good design work up front, it will be a flexible interface that we never have to break backwards compatibility with. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |