[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

API/ABIs: Re: [RFC PATCH v2 0/2] Add a new acquire resource to query vcpu statistics


  • To: Matias Ezequiel Vara Larsen <matiasevara@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Thu, 23 Feb 2023 19:56:28 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zyOLyV+PKQqFWDsR2+Fbj6MZ0evlP7hc9G/W/vzn/24=; b=NTDln5azhqpnJARMol/+bSFTDnJJEaUGFqSguwa9U20fOG2vd/V4fwc919ddowGMXnoCGEJoVLpljkZaCxC783pTWL7dtVawRIabaVTbXvMJ4EbgWztBy9uT0lH4xA1SqNH/rW7MBJ1q1Ch+nYyftgsco+summcmxwNUTucR0qcmZ2lU9nNHuw/0c/dYpWTNdCma/WmsKwDllS4ddEFhOkOWzYovR3eb+GkBcKC5bPyylQ6gOOst2Y6UymMGraV+F5rWsrcpuH1H4vBnUPFgY3PwhxeSkVSVRr9jOpzgoM/3SMJ9dHUn4+bchd+FLng63ZG0MDRu9a/eAFzZGL93Mw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gYTHbAJNfomuzTLeaqTFqn46WDy8vynuYNNGmes7V3EbXoSfd8WizsPSa3rop2dqt6VlEn+z9yae8TDW9BmBgL7SKI5xxY8NRgTHKOr7pmPnUIHjY3kJTMa60qISl6U/YxBlMOHNeMvwoipdAqCSfflFFCpor4SKOctSk/HEgx+sDo0FXry2DSu+2nstEByBmMdiNzejSZ86g6WIjN3ToMrw7WGx+n6GRzcAX6fekBfKfYrXXU5gOzYepITcZmTc2IoYw6B5ZNCMrA/XKWn9bXouRR8o+vOBIC+TjVRKlIFzneAHqv/nZgfbRIutQ1lrt6b6hxNDAATF0R5+2RGvqw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Matias Ezequiel Vara Larsen <matias.vara@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>
  • Delivery-date: Thu, 23 Feb 2023 19:56:56 +0000
  • Ironport-data: A9a23:n6GZX6xTlM++++e2st56t+e9xyrEfRIJ4+MujC+fZmUNrF6WrkUDm jdJDTqHPfzeYjDzc9kjPYWwo0MO7MCDn4VgSQo4qyAxQypGp/SeCIXCJC8cHc8wwu7rFxs7s ppEOrEsCOhuExcwcz/0auCJQUFUjP3OHfykTrafYEidfCc8IA85kxVvhuUltYBhhNm9Emult Mj75sbSIzdJ4RYtWo4vw//F+UwHUMja4mtC5QRkP6sT5zcyqlFOZH4hDfDpR5fHatE88t6SH 47r0Ly/92XFyBYhYvvNfmHTKxBirhb6ZGBiu1IOM0SQqkEqSh8ai87XAME0e0ZP4whlqvgqo Dl7WT5cfi9yVkHEsLx1vxC1iEiSN4UekFPMCSDXXcB+UyQq2pYjqhljJBheAGEWxgp4KWEU+ aczKCEDUkyCoruzwYCfdPNtiP12eaEHPKtH0p1h5RfwKK58BLzmGODN79Ie2yosjMdTG/qYf 9AedTdkcBXHZVtIJ0sTD5U92uyvgxETcRUB8A7T+fVxvjaVkFYZPLvFabI5fvSjQ8lPk1nej WXB52njWTkRNcCFyCrD+XWp7gPKtXKrAdJDROHinhJsqHyc2VwdJgA6bn74ueeG0m66aehvb GVBr0LCqoB3riRHVOLVWhSkoXefswAVQdN4HOgz6QXLwa3Riy6JC25BQjNfZdgOsM4tWSdsx lKPh8nuBzFkrPuSU3313rKdsTK7Im4LJHULTTENUQYepdf5yKkxhB/SStdoEIauk8b4Xzr3x liisywWl7gVy8kR2M2T/03Dgj+qjojESEgy/Aq/dnm+8gpzaYqhZoqpwVvW9/BNKMCeVFbpl HQKkseR7ecKDLmWiTeABu4KGdmUC+2tNTTdhRtjGsIn/jH1oXq7J9gMund5OVtjNdsCdXnxe kjPtAhN5ZhVeny3catwZIH3AMMvpUT9KenYujnvRoImSvBMmMWvpkmCuWb4M7jRrXUR
  • Ironport-hdrordr: A9a23:NdiIvq0n8Sqm3u8Wy4+mWwqjBIkkLtp133Aq2lEZdPU1SL3hqy nKpp536faaskd0ZJhNo6HnBEDiexLhHPxOkOws1N6ZNWGMhILPFvAA0WLM+UyDJ8SUzJ876U 4PSdkGNDQyNzdHZATBjDVQ3+xP/DBPysCVuds=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

A discussion about forward extensible APIs and ABIs here.

First, its important to say that this should be "domain stats" and not
just "vcpu stats".  This is easy to account for now, but hard to
retrofit later.

For the shared memory, we have a minimum granularity of PAGE_SIZE (4k
for now, but this will change in due course on non-x86 architectures),
and a resource-agnostic way of determining the resource size (as a
multiple of the page size).

But there are other things we need to consider:

1) To be sensibly extendable, there needs to be some metadata, and the
domain stats is clearly going to be a different shape to the vcpu stats.

2) We also want to give Xen some flexibility to allocate memory in a
suitable manner for the system.

3) Xen and the userspace consuming this interface will likely be built
from different versions of the header.  This needs to inter-operate with
the common subset of functionality.


So what we want, at least to describe the shape, is something like this:

struct dom_shmem_stats {
    uint32_t dom_size; // sizeof(dom)
    uint32_t vcpu_offs;
    uint32_t vcpu_size; // sizeof(vcpu)
    uint32_t vcpu_stride;
    ...
};

struct vcpu_shmem_stats {
    ...
};

where the data layout shall be that there is one dom structure starting
at 0, and an array of vcpu_stride objects starting at vcpu_offset.

Obviously, some invariants apply.  vcpu_offs >= dom_size, and
vcpu_stride >= vcpu_size.  The total size of the mapping must be larger
than vcpu_offs + vcpus * vcpu_stride  (and no, I intentionally don't
care about the rounding at the end because it doesn't change things in
practice.)

A very simple layout, packing the data as closely as reasonable, might be:

vcpu_offs = roundup(sizeof(dom), cacheline_size)
vcpu_stride = roundup(sizeof(vcpu), cacheline_size);

but Xen might have reason to use some other rounding.  As the dom or
vcpu size approaches a page, then Xen will want to change allocation
scheme to use page size for both, and not vmap the lot as one
consecutive region.



For the stats data itself, there wants to be some indication of what
data Xen is producing, so userspace can know not to bother consuming. 
So something like:

struct $foo_stats {
    ...

#define STATS_A (1u << 0)
#define STATS_B (1u << 2)
    uint32_t stats_active;
   
    struct $foo_stats_a {
        uint32_t single_field;
        ... // maybe other singleton fields
    };

    struct $foo_stats_b {
        uint32_t seq;  // "seq" is more common than "version"
        uint32_t _pad;
        uint64_t field1;
        uint64_t field2;
        ...
    };
};


Forward compatibility rules say that you can only ever append new
fields.  But as hinted at with the stats_active field, its fine to leave
reserved fields around for future use, generally with a rule that
anything reserved shall be 0.

Importantly, this means that offsetof(dom, stats_b) is fixed, and will
inter-operate just fine if e.g. userspace knows about a stats_c that Xen
doesn't know about.

But this does highlight some more invariants.  Xen must not produce any
data outside of [0, vcpu_offs) for dom data, and [base, vcpu_stride) for
vcpu data.

Furthermore, Xen should not produce any data not indicated by the
stats_active field.  That said, if Xen is compiled knowing about
dom->stats_c, and userspace is not, then userspace will observe Xen
advertising a stat it doesn't know, and producing data beyond
userspace's sizeof(dom), but within dom->vcpu_offs.  This is explicitly
fine and expected, and why Xen writes it's sizeof() information into the
dom header.  This allows both sides to agree on the layout even when
they're not compiled from identical copies of the header.



A few closing thoughts.

1) It is wise to have a magic number at the head of each dom and vcpu
struct.  This helps sanity check that both sides have correctly agreed
on the layout, but can also serve double duty as an ABI "version".  If
we screw up spectacularly, and decide that the best course of action is
to break backwards compatibility, then we can change the magic and edit
the structs in a non-forwards-compatible way.

2) We may get to a point of wanting arch specific stats.  This can be
accommodated in the model by having struct arch_{dom,vcpu}_stats at
positions described by the layout at the start of dom.  It would be wise
to leave space (reserved fields) there to use if necessary.  This is
cleaner than deciding that we need to put some new layout fields after
the latest $foo_stats_$N and before $foo_stats_$M.

3) It would be great if we could have something in tools/tests/ which
can attach to a running VM and assess the correctness of the invariants
given.  Better yet if it could compile for each change of the ABI and
assess the correctness for all.


I hope this all makes sense.  I know its not trivial, but there's also
nothing in here which is rocket science, and with a bit of good design
work up front, it will be a flexible interface that we never have to
break backwards compatibility with.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.