[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 12/25] xen: Replace sysctl/readconsole with autogenerated version



On Tue Nov 26, 2024 at 9:40 AM GMT, Jan Beulich wrote:
> On 25.11.2024 19:51, Alejandro Vallejo wrote:
> > On Mon Nov 25, 2024 at 12:05 PM GMT, Jan Beulich wrote:
> >> On 15.11.2024 12:51, Alejandro Vallejo wrote:
> >>> Describe sysctl/readconsole as a TOML specification, remove old
> >>> hand-coded version and replace it with autogenerated file.
> >>>
> >>> While at it, transform the console driver to use uint8_t rather than
> >>> char in order to mandate the type to be unsigned and ensure the ABI is
> >>> not defined with regards to C-specific types.
> >>
> >> Yet the derived C representation imo then should still be using char, not
> >> uint8_t.
> > 
> > There's 2 issued addressed by this patch.
> > 
> >   1. The removal of char from the external headers (and the Xen driver).
> >   2. The replacement of the existing struct by the autogenerated one.
> > 
> > (1) wants doing irrespective of (2). char has neither a fixed width nor a 
> > fixed
> > sign. Which is irrelevant for ABI purposes in this case because what we 
> > really
> > meant is "give me a pointer" in this hypercall, but it may be important in
> > other cases.
> > 
> > IOW, char should've never made it to the definition of the public ABI, and 
> > I'm
> > merely taking the chance to take it out. Happy to extract this patch and 
> > send
> > it separately.
>
> Well, work towards fully getting char out of the public headers may indeed be
> worthwhile. Otoh with char being the basic addressing granularity, I think
> the ABI is pretty much tied to sizeof(char) == 1, imo limiting the
> worthwhile-ness quite a bit.

Let me put it another way. If I were to create a separate patch stripping char
and using uint8_t instead, what are my chances of getting an Acked-by? Or not a
NAK, at least. (there's other maintainers that I need that from, but one step
at a time).

>
> Signed-ness of plain char doesn't really matter as long as it's used only for
> what really are characters (or strings thereof). And that looks the be pretty
> much the case throughout the public headers.

Maybe. Still, as a general principle caller and callee ought to agree on size,
alignment and sign for every type. I'd rather not make exceptions for that
invariant unless truly well motivated. And in this case it's a case of
requiring trivial non-functional changes.

>
> >> In particular it would be a good sign if the Xen sources wouldn't
> >> need to change, unlike happens here (altering types of a few internals of
> >> the console machinery).
> > 
> > And that would be the case if Xen had uniform naming conventions and its ABI
> > was fully unambiguous. The process of uniformizing the naming convention and
> > disambiguating the ABI is bound to cause (non-functional) changes, mostly in
> > the naming conventions side of things.
> > 
> > Naming conventions can be _MOSTLY_ sorted by creating compat #defines and
> > typedefs that match the old types. I can do that, but note that even then 
> > some
> > code would have to change in order to i.e: s/struct OLD_NAME/NEW_NAME_T/
>
> Here we already disagree: I think we'd better avoid any typedef-s in the 
> public
> interface when they're not strictly needed (e.g. in order to declare 
> associated
> handles). Imo we simply shouldn't introduce more into the name space than
> actually required.

I really don't want to pollute the spec files with tidbits of how C happened to
case its identifiers originally.

> > If this is deemed important for backporting changes, I can do it for 
> > invasive
> > replacements, like the createdomain flags.
> > 
> > On the topic of changing types, The present case is an ABI inconsistency 
> > case.
> > My intention is to keep the ABI fixed as a matter of principle (if anything,
> > because the domU ABI cannot be changed). However, changing the way C 
> > represents
> > said ABI is a requirement if the current definition is ambiguous. In those
> > cases we ought to change C to ensure there's one and only one way of
> > interpreting it.
>
> I wonder what concrete cases of ambiguity you're thinking of here.
>

Anything where the size, alignment or sign of a type might not be up for
interpretation. This "char" case is one. So are things like xen_sysctl_meminfo
due to using raw uint64_t rather than uint64_aligned_t.

> >>>  xen/include/public/autogen/sysctl.h           | 35 +++++++++++++++
> >>
> >> In the build tree, having an autogen subdir under public/ _may_ be okay
> >> (personally I dislike even that). I didn't manage to spot adjustments to
> >> how files are eventually installed, yet at that point there clearly
> >> shouldn't be any autogen subdir(s) anymore. How the individual files come
> >> into existence is, imo, nothing consumers of the interface ought to (need
> >> to) care about.
> > 
> > Anthony already mentioned an error while building QEMU, which I'm guessing
> > comes from the same problem. The stitching is definitely up for discussion. 
> > I
> > got far enough to allow the compilation of `dist` to go through, but didn't
> > think incredibly hard about the finer details (like the install targets).
> > 
> > In principle, renaming `autogen` to `abi` and adding its contents to the 
> > list of
> > installed headers ought to sort that particular concern? 
>
> Not really, no. That only gives the child a different name. Imo the original
> tree structure shouldn't change. By the end of the conversion work, all-
> generated stuff would simply replace all-hand-written stuff.

I'm afraid that introduces a lot of undue complexity. Putting everything
autogenerated in the same folder where only autogenerated headers live (from
the same generator) provides for easy ways of checking whether files are stale
or not. Trying to surgically replace C would require either a C parser or magic
tokens in the header (yuck!).

I want to avoid anyone having to explicitly include the new headers, but they
do definitely want to be installed in the targets alongside the rest of the
public headers. Because they are public headers.

>
> >>> --- /dev/null
> >>> +++ b/tools/rust/xenbindgen/extra/sysctl/readconsole.toml
> >>> @@ -0,0 +1,43 @@
> >>> +[[structs]]
> >>> +name = "xen_sysctl_readconsole"
> >>> +description = "Read console content from Xen buffer ring."
> >>> +
> >>> +[[structs.fields]]
> >>> +name = "clear"
> >>> +description = "IN: Non-zero -> clear after reading."
> >>> +typ = { tag = "u8" }
> >>> +
> >>> +[[structs.fields]]
> >>> +name = "incremental"
> >>> +description = "IN: Non-zero -> start index specified by `index` field."
> >>> +typ = { tag = "u8" }
> >>> +
> >>> +[[structs.fields]]
> >>> +name = "_pad"
> >>> +description = "Unused."
> >>> +typ = { tag = "u16" }
> >>> +
> >>> +[[structs.fields]]
> >>> +name = "index"
> >>> +description = """
> >>> +IN:  Start index for consuming from ring buffer (if @incremental);
> >>> +OUT: End index after consuming from ring buffer."""
> >>> +typ = { tag = "u32" }
> >>> +
> >>> +[[structs.fields]]
> >>> +name = "buffer"
> >>> +description = """
> >>> +IN: Virtual address to write console data.
> >>> +
> >>> +NOTE: The pointer itself is IN, but the contents of the buffer are 
> >>> OUT."""
> >>> +typ = { tag = "ptr", args = { tag = "u8" } }
> >>> +
> >>> +[[structs.fields]]
> >>> +name = "count"
> >>> +description = "IN: Size of buffer; OUT: Bytes written to buffer."
> >>> +typ = { tag = "u32" }
> >>> +
> >>> +[[structs.fields]]
> >>> +name = "rsvd0_a"
> >>> +description = "Tail padding reserved to zero."
> >>> +typ = { tag = "u32" }
> >>
> >> Up to here I wasn't able to spot any documentation on what it to be written
> >> in which way.
> > 
> > You're right that the specification is not itself specified. I neglected to 
> > do
> > so to avoid having to rewrite it should we settle on a different markup
> > language.
> > 
> > Much of your confusion seems to stem from simultanuously looking at a new
> > markup language and a new schema for it. Let me try to unpick some of 
> > that...
> > 
> >> I already struggle with the double square brackets. The TOML
> >> doc I found when searching the web doesn't have such. Taking just the 
> >> example
> >> above also doesn't allow me to conclude how e.g. nested structures would be
> >> specified.
> > 
> > The schema is represented by the contents of `spec.rs`. All structs with a
> > Deserialize attribute (i.e: #[derive(Foo, Bar, Deserialize)]) map to some
> > "table" in TOML.
> > 
> > When I say "struct" now I mean a struct inside the generator that represents
> > the input file (_NOT_ a struct representing a hypercall).
> > 
> > The rules are as follows. The whole file is deserialized in a single struct
> > (InFileDef). When there's a single square bracket (which I don't think I've
> > required yet), that means that what follows is a "table" with the name 
> > between
> > the brackets. There's several ways to represent table
> > 
> >     Regular tables:           [foo]
> >                               bar = "some_bar"
> >                               baz = "some_baz"
> > 
> >                               [foo2]
> >                               bar = "blergh"
> > 
> >     Inline tables:            foo = { bar = "some_bar", baz = "some_baz" }
> >                               foo2 = { bar = "blergh" }
> > 
> > Both of those deserialize to the same thing (it's C for ease of explaining 
> > it
> > here, but it's actually Rust in the generator).
> > 
> >                 struct infiledef {
> >                     struct {
> >                         char *bar; // points to "some_bar"
> >                         char *baz; // points to "some_baz"
> >                     } foo;
> >                     struct {
> >                         char *bar; // points to "blergh"
> >                     } foo2;
> >                 };
> > 
> > The double brackets are adding one more element to a "list"
> > 
> > That is. This TOML...
> > 
> >                              [[foos]]
> >                              bar = "some_bar"
> >                              baz = "some_baz"
> > 
> >                              [[foos]]
> >                              bar = "some_bar"
> >                              baz = "some_baz"
> > 
> > ... deserializes to...
> > 
> >                 struct foodef {
> >                     char *bar;
> >                     char *baz;
> >                 }
> > 
> >                 struct infiledef {
> >                     struct foodef *foos;
> >                 };
> > 
> > The last bit of relevant information is that you can identify which table 
> > you
> > want to add to with dots. So [[structs.fields]] is saying "Add this field to
> > the list of fields of the current hypercall struct".
> > 
> > The "typ" field is a bit quirky (I have a solution to simplify it), but that
> > uses inline tables.
> > 
> >> Really, when talk was of some form of IDL, I expected to see
> >> something IDLish (im particular closer to typical programming languages we
> >> use). Whereas TOML, aiui, is more an easy language for config files of all
> >> sorts.
> > 
> > I might've been unclear in the talk. One of my goals is to _not_ define a 
> > new
> > language. Or I'll just exchange one problem for two. Maybe I should've 
> > called
> > it an Interface Definition Schema, rather than Language.
> > 
> > The key benefit here is that, while the generators can be tricky, the 
> > parser is
> > all done and strictly specified. We can experiment with YAML (Anthony 
> > already
> > asked about it). But it really is a matter of getting used to. TOML is
> > fantastic for saving horizontal space. And multi-line comments are neatly
> > organized.
>
> It may indeed be a matter of getting used to, yet I'm afraid your explanations
> above make the situation yet more confusing, at least for the moment. Just to
> explain my expectations some: I don't really speak e.g. Python or Perl, but 
> the
> languages are self-explanatory above that I can at least roughly follow not
> overly involved pieces of code written therein. An interface specification imo
> certainly falls in the group of "not overly involved". Yet the (so far small)
> pieces of TOML that there aren't as self-explanatory as I'd have expected them
> to be.

That's fair. I'll have it translated to YAML for comparison, as  I suspect
it'll be easier on the eye.

>
> >> What I have in mind wouldn't allow for descriptions, yet I'm not sure 
> >> that's
> >> relevant. The description ought to, first of all, live in the primary 
> >> source
> >> (i.e. the IDL itself) anyway. Commentary there might be possible to extract
> >> into proper (machine generated/derived) documentation.
> > 
> > Not sure I follow, these TOML files _are_ the IDL itself.
> > 
> > The descriptions here are propagated to the generated code, so they are
> > infinitely helpful when reaching the type via e.g: cscope, LSPs, etc.
>
> Yet they are full-fledged attributes, when normally I'd expect such to be
> comments (thus clearly separating "code" from "non-code").

A comment in TOML (or YAML) would stay at the markup because the parser would
never see it. It must be an attribute so it's brought to it's intended
destination (the .h/.rs files). While it'd be desirable to have a connection
between the spec files and their generated counterparts at the LSP-level, I'm
not aware of any way of doing so. And by and large, unless you're actively
adding another hypercall, you really shouldn't care.

>
> As to propagating to generated code - hmm, yes, for use of cscope and alike
> it may indeed be helpful to propagate, if such tools can't associate back
> the generated files to their origins. My general take though is that
> generated code is something you look at as a human only when actually
> dealing with their generation. For all other purposes one would prefer the
> "original" source.

I'd say it's the other way around. I only look at the source material when
working on the generation. In general, I care about what was generated (if
anything, because that's where cscope took me while traversing the tree.

Of course, all this assumes all information in the source file has been
propagated to the generated file. And that's the case here.

>
> >>> --- a/xen/drivers/char/console.c
> >>> +++ b/xen/drivers/char/console.c
> >>> @@ -42,6 +42,8 @@
> >>>  #include <asm/vpl011.h>
> >>>  #endif
> >>>  
> >>> +#include <public/xen.h>
> >>
> >> Why would this be needed all of the sudden?
> >>
> > 
> > Because of the new XEN_GUEST_HANDLE_64(uint8) type. The macro is quite ugly 
> > and
> > requires being declared ahead.
>
> How's that different from the earlier XEN_GUEST_HANDLE_PARAM(char)?

I'll have a go at removing it. I wrote that a while ago and I no longer
remember the specifics of why it was required.

>
> >>> --- /dev/null
> >>> +++ b/xen/include/public/autogen/sysctl.h
> >>> @@ -0,0 +1,35 @@
> >>> +/*
> >>> + * sysctl
> >>> + *
> >>> + * AUTOGENERATED. DO NOT MODIFY
> >>> + */
> >>> +#ifndef __XEN_AUTOGEN_SYSCTL_H
> >>> +#define __XEN_AUTOGEN_SYSCTL_H
> >>> +
> >>> +/* Read console content from Xen buffer ring. */
> >>> +struct xen_sysctl_readconsole {
> >>> +    /* IN: Non-zero -> clear after reading. */
> >>> +    uint8_t clear;
> >>> +    /* IN: Non-zero -> start index specified by `index` field. */
> >>> +    uint8_t incremental;
> >>> +    /* Unused. */
> >>> +    uint16_t _pad;
> >>> +    /*
> >>> +     * IN:  Start index for consuming from ring buffer (if @incremental);
> >>> +     * OUT: End index after consuming from ring buffer.
> >>> +     */
> >>> +    uint32_t index;
> >>> +    /*
> >>> +     * IN: Virtual address to write console data.
> >>> +     *
> >>> +     * NOTE: The pointer itself is IN, but the contents of the buffer 
> >>> are OUT.
> >>> +     */
> >>> +    XEN_GUEST_HANDLE_64(uint8) buffer;
> >>> +    /* IN: Size of buffer; OUT: Bytes written to buffer. */
> >>> +    uint32_t count;
> >>> +    /* Tail padding reserved to zero. */
> >>> +    uint32_t rsvd0_a;
> >>> +};
> >>> +
> >>> +#endif /* __XEN_AUTOGEN_SYSCTL_H */
> >>> +
> >>
> >> If this file is auto-generated, why would it need committing? And yes, 
> >> there
> >> is the connected question: Will everyone then need to have a Rust compiler
> >> available?
> > 
> > Committing it is required precisely so that no one needs to have a Rust
> > compiler available.
>
> How would that work? If I make a change to what currently is a public header,
> I'd still need to have one available, wouldn't I?

If you want to make a change to the public ABI, then yes. But that's reasonaly
rare. What I want to avoid is _everyone_ needing a Rust compiler just to build,
or indeed pushing the Rust dependency as a build-dependency on every
downstream. That might come to pass, as the efforts to Rust-ify the toolstack
come to fruition, but we're not there yet.

As for one-off changes, it's actually fairly easy to determine what a new field
in the spec would look like in the final file. You could even write it by hand.
CI will validate the C headers match the spec outputs.

When they don't match, CI tells you the diff.

>
> > The last patch in the series checks the generated code
> > matches the specs byte by byte. It has the nice benefit that you can 
> > git-grep
> > for it and tags work even without compiling first. You also get all
> > architectures upfront and it's a lot easier to review changes to the 
> > generator
> > because CI will scream to you if the outputs diverge.
> > 
> >>
> >> Nit: For anything that is committed, it would be nice if those files were 
> >> as
> >> tidy as possible style-wise. Most of the above looks entirely okay, just
> >> that there is an unnecessary trailing blank line.
> > 
> > I did go out of my way to prettify the output.
> > 
> > The trailing newline was intentional to make it C++03-compatible. I can get 
> > rid
> > of it, as it doesn't matter a whole lot.
>
> I haven't followed the development of C++ very closely; my experience with it
> is mostly from far more than 20 years ago. What's that C++03 compatibility
> requirement?
>
> Jan

It's really unimportant. -Wnewline-eof, I think it was. There's a pedantic
paragraph in the C++ spec from 2003 that states that if a file is not
newline-terminated it's UB to parse it. I tend to newline-terminate my files
for easy `cat`-ing them together in the rare occasions where that's useful for
something.

Regardless, it's trully unimportant, I'll just get rid of that final newline.

Cheers,
Alejandro



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.