[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 12/25] xen: Replace sysctl/readconsole with autogenerated version



Hi Jan,

Thanks for having a look.

On Mon Nov 25, 2024 at 12:05 PM GMT, Jan Beulich wrote:
> On 15.11.2024 12:51, Alejandro Vallejo wrote:
> > Describe sysctl/readconsole as a TOML specification, remove old
> > hand-coded version and replace it with autogenerated file.
> > 
> > While at it, transform the console driver to use uint8_t rather than
> > char in order to mandate the type to be unsigned and ensure the ABI is
> > not defined with regards to C-specific types.
>
> Yet the derived C representation imo then should still be using char, not
> uint8_t.

There's 2 issued addressed by this patch.

  1. The removal of char from the external headers (and the Xen driver).
  2. The replacement of the existing struct by the autogenerated one.

(1) wants doing irrespective of (2). char has neither a fixed width nor a fixed
sign. Which is irrelevant for ABI purposes in this case because what we really
meant is "give me a pointer" in this hypercall, but it may be important in
other cases.

IOW, char should've never made it to the definition of the public ABI, and I'm
merely taking the chance to take it out. Happy to extract this patch and send
it separately.

> In particular it would be a good sign if the Xen sources wouldn't
> need to change, unlike happens here (altering types of a few internals of
> the console machinery).

And that would be the case if Xen had uniform naming conventions and its ABI
was fully unambiguous. The process of uniformizing the naming convention and
disambiguating the ABI is bound to cause (non-functional) changes, mostly in
the naming conventions side of things.

Naming conventions can be _MOSTLY_ sorted by creating compat #defines and
typedefs that match the old types. I can do that, but note that even then some
code would have to change in order to i.e: s/struct OLD_NAME/NEW_NAME_T/

If this is deemed important for backporting changes, I can do it for invasive
replacements, like the createdomain flags.

On the topic of changing types, The present case is an ABI inconsistency case.
My intention is to keep the ABI fixed as a matter of principle (if anything,
because the domU ABI cannot be changed). However, changing the way C represents
said ABI is a requirement if the current definition is ambiguous. In those
cases we ought to change C to ensure there's one and only one way of
interpreting it.

>
> > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@xxxxxxxxx>
> > ---
> >  stubdom/Makefile                              |  2 +-
> >  tools/rust/Makefile                           | 19 ++++++++
> >  .../xenbindgen/extra/sysctl/readconsole.toml  | 43 +++++++++++++++++++
> >  xen/drivers/char/console.c                    | 12 +++---
> >  xen/include/public/autogen/sysctl.h           | 35 +++++++++++++++
>
> In the build tree, having an autogen subdir under public/ _may_ be okay
> (personally I dislike even that). I didn't manage to spot adjustments to
> how files are eventually installed, yet at that point there clearly
> shouldn't be any autogen subdir(s) anymore. How the individual files come
> into existence is, imo, nothing consumers of the interface ought to (need
> to) care about.

Anthony already mentioned an error while building QEMU, which I'm guessing
comes from the same problem. The stitching is definitely up for discussion. I
got far enough to allow the compilation of `dist` to go through, but didn't
think incredibly hard about the finer details (like the install targets).

In principle, renaming `autogen` to `abi` and adding its contents to the list of
installed headers ought to sort that particular concern? 

>
> > --- /dev/null
> > +++ b/tools/rust/xenbindgen/extra/sysctl/readconsole.toml
> > @@ -0,0 +1,43 @@
> > +[[structs]]
> > +name = "xen_sysctl_readconsole"
> > +description = "Read console content from Xen buffer ring."
> > +
> > +[[structs.fields]]
> > +name = "clear"
> > +description = "IN: Non-zero -> clear after reading."
> > +typ = { tag = "u8" }
> > +
> > +[[structs.fields]]
> > +name = "incremental"
> > +description = "IN: Non-zero -> start index specified by `index` field."
> > +typ = { tag = "u8" }
> > +
> > +[[structs.fields]]
> > +name = "_pad"
> > +description = "Unused."
> > +typ = { tag = "u16" }
> > +
> > +[[structs.fields]]
> > +name = "index"
> > +description = """
> > +IN:  Start index for consuming from ring buffer (if @incremental);
> > +OUT: End index after consuming from ring buffer."""
> > +typ = { tag = "u32" }
> > +
> > +[[structs.fields]]
> > +name = "buffer"
> > +description = """
> > +IN: Virtual address to write console data.
> > +
> > +NOTE: The pointer itself is IN, but the contents of the buffer are OUT."""
> > +typ = { tag = "ptr", args = { tag = "u8" } }
> > +
> > +[[structs.fields]]
> > +name = "count"
> > +description = "IN: Size of buffer; OUT: Bytes written to buffer."
> > +typ = { tag = "u32" }
> > +
> > +[[structs.fields]]
> > +name = "rsvd0_a"
> > +description = "Tail padding reserved to zero."
> > +typ = { tag = "u32" }
>
> Up to here I wasn't able to spot any documentation on what it to be written
> in which way.

You're right that the specification is not itself specified. I neglected to do
so to avoid having to rewrite it should we settle on a different markup
language.

Much of your confusion seems to stem from simultanuously looking at a new
markup language and a new schema for it. Let me try to unpick some of that...

> I already struggle with the double square brackets. The TOML
> doc I found when searching the web doesn't have such. Taking just the example
> above also doesn't allow me to conclude how e.g. nested structures would be
> specified.

The schema is represented by the contents of `spec.rs`. All structs with a
Deserialize attribute (i.e: #[derive(Foo, Bar, Deserialize)]) map to some
"table" in TOML.

When I say "struct" now I mean a struct inside the generator that represents
the input file (_NOT_ a struct representing a hypercall).

The rules are as follows. The whole file is deserialized in a single struct
(InFileDef). When there's a single square bracket (which I don't think I've
required yet), that means that what follows is a "table" with the name between
the brackets. There's several ways to represent table

    Regular tables:           [foo]
                              bar = "some_bar"
                              baz = "some_baz"

                              [foo2]
                              bar = "blergh"

    Inline tables:            foo = { bar = "some_bar", baz = "some_baz" }
                              foo2 = { bar = "blergh" }

Both of those deserialize to the same thing (it's C for ease of explaining it
here, but it's actually Rust in the generator).

                struct infiledef {
                    struct {
                        char *bar; // points to "some_bar"
                        char *baz; // points to "some_baz"
                    } foo;
                    struct {
                        char *bar; // points to "blergh"
                    } foo2;
                };

The double brackets are adding one more element to a "list"

That is. This TOML...

                             [[foos]]
                             bar = "some_bar"
                             baz = "some_baz"

                             [[foos]]
                             bar = "some_bar"
                             baz = "some_baz"

... deserializes to...

                struct foodef {
                    char *bar;
                    char *baz;
                }

                struct infiledef {
                    struct foodef *foos;
                };

The last bit of relevant information is that you can identify which table you
want to add to with dots. So [[structs.fields]] is saying "Add this field to
the list of fields of the current hypercall struct".

The "typ" field is a bit quirky (I have a solution to simplify it), but that
uses inline tables.

> Really, when talk was of some form of IDL, I expected to see
> something IDLish (im particular closer to typical programming languages we
> use). Whereas TOML, aiui, is more an easy language for config files of all
> sorts.

I might've been unclear in the talk. One of my goals is to _not_ define a new
language. Or I'll just exchange one problem for two. Maybe I should've called
it an Interface Definition Schema, rather than Language.

The key benefit here is that, while the generators can be tricky, the parser is
all done and strictly specified. We can experiment with YAML (Anthony already
asked about it). But it really is a matter of getting used to. TOML is
fantastic for saving horizontal space. And multi-line comments are neatly
organized.

>
> What I have in mind wouldn't allow for descriptions, yet I'm not sure that's
> relevant. The description ought to, first of all, live in the primary source
> (i.e. the IDL itself) anyway. Commentary there might be possible to extract
> into proper (machine generated/derived) documentation.

Not sure I follow, these TOML files _are_ the IDL itself.

The descriptions here are propagated to the generated code, so they are
infinitely helpful when reaching the type via e.g: cscope, LSPs, etc.

>
> > --- a/xen/drivers/char/console.c
> > +++ b/xen/drivers/char/console.c
> > @@ -42,6 +42,8 @@
> >  #include <asm/vpl011.h>
> >  #endif
> >  
> > +#include <public/xen.h>
>
> Why would this be needed all of the sudden?
>

Because of the new XEN_GUEST_HANDLE_64(uint8) type. The macro is quite ugly and
requires being declared ahead.

> > --- /dev/null
> > +++ b/xen/include/public/autogen/sysctl.h
> > @@ -0,0 +1,35 @@
> > +/*
> > + * sysctl
> > + *
> > + * AUTOGENERATED. DO NOT MODIFY
> > + */
> > +#ifndef __XEN_AUTOGEN_SYSCTL_H
> > +#define __XEN_AUTOGEN_SYSCTL_H
> > +
> > +/* Read console content from Xen buffer ring. */
> > +struct xen_sysctl_readconsole {
> > +    /* IN: Non-zero -> clear after reading. */
> > +    uint8_t clear;
> > +    /* IN: Non-zero -> start index specified by `index` field. */
> > +    uint8_t incremental;
> > +    /* Unused. */
> > +    uint16_t _pad;
> > +    /*
> > +     * IN:  Start index for consuming from ring buffer (if @incremental);
> > +     * OUT: End index after consuming from ring buffer.
> > +     */
> > +    uint32_t index;
> > +    /*
> > +     * IN: Virtual address to write console data.
> > +     *
> > +     * NOTE: The pointer itself is IN, but the contents of the buffer are 
> > OUT.
> > +     */
> > +    XEN_GUEST_HANDLE_64(uint8) buffer;
> > +    /* IN: Size of buffer; OUT: Bytes written to buffer. */
> > +    uint32_t count;
> > +    /* Tail padding reserved to zero. */
> > +    uint32_t rsvd0_a;
> > +};
> > +
> > +#endif /* __XEN_AUTOGEN_SYSCTL_H */
> > +
>
> If this file is auto-generated, why would it need committing? And yes, there
> is the connected question: Will everyone then need to have a Rust compiler
> available?

Committing it is required precisely so that no one needs to have a Rust
compiler available. The last patch in the series checks the generated code
matches the specs byte by byte. It has the nice benefit that you can git-grep
for it and tags work even without compiling first. You also get all
architectures upfront and it's a lot easier to review changes to the generator
because CI will scream to you if the outputs diverge.

>
> Nit: For anything that is committed, it would be nice if those files were as
> tidy as possible style-wise. Most of the above looks entirely okay, just
> that there is an unnecessary trailing blank line.

I did go out of my way to prettify the output.

The trailing newline was intentional to make it C++03-compatible. I can get rid
of it, as it doesn't matter a whole lot.

>
> Jan

Cheers,
Alejandro



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.