[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.



On Thu, 15 Jun 2023, Roberto Bagnara wrote:
> This document specifies the C language dialect used by Xen and
> the assumptions Xen makes on the translation toolchain.
> 
> Signed-off-by: Roberto Bagnara <roberto.bagnara@xxxxxxxxxxx>

Thanks Roberto for the amazing work of research and archaeology.

I have a few comments below, mostly to clarify the description of some
of the less documented GCC extensions, for the purpose of having all
community members be able to understand what they can and cannot use.


> ---
>  docs/misra/C-language-toolchain.rst | 465 ++++++++++++++++++++++++++++
>  1 file changed, 465 insertions(+)
>  create mode 100644 docs/misra/C-language-toolchain.rst
> 
> diff --git a/docs/misra/C-language-toolchain.rst 
> b/docs/misra/C-language-toolchain.rst
> new file mode 100644
> index 0000000000..013cef071c
> --- /dev/null
> +++ b/docs/misra/C-language-toolchain.rst
> @@ -0,0 +1,465 @@
> +=============================================
> +C Dialect and Translation Assumptions for Xen
> +=============================================
> +
> +This document specifies the C language dialect used by Xen and
> +the assumptions Xen makes on the translation toolchain.
> +It covers, in particular:
> +
> +1. the used language extensions;
> +2. the translation limits that the translation toolchains must be able
> +   to accommodate;
> +3. the implementation-defined behaviors upon which Xen may depend.
> +
> +All points are of course relevant for portability.  In addition,
> +programming in C is impossible without a detailed knowledge of the
> +implementation-defined behaviors.  For this reason, it is recommended
> +that Xen developers have familiarity with this document and the
> +documentation referenced therein.
> +
> +This document needs maintenance and adaptation in the following
> +circumstances:
> +
> +- whenever the compiler is changed or updated;
> +- whenever the use of a certain language extension is added or removed;
> +- whenever code modifications cause exceeding the stated translation limits.
> +
> +
> +Applicable C Language Standard
> +______________________________
> +
> +Xen is written in C99 with extensions.  The relevant ISO standard is
> +
> +    *ISO/IEC 9899:1999/Cor 3:2007*: Programming Languages - C,
> +    Technical Corrigendum 3.
> +    ISO/IEC, Geneva, Switzerland, 2007.
> +
> +
> +Reference Documentation
> +_______________________
> +
> +The following documents are referred to in the sequel:
> +
> +GCC_MANUAL:
> +  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc.pdf
> +CPP_MANUAL:
> +  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/cpp.pdf
> +ARM64_ABI_MANUAL:
> +  
> https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst
> +X86_64_ABI_MANUAL:
> +  
> https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
> +ARM64_LIBC_MANUAL:
> +  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
> +X86_64_LIBC_MANUAL:
> +  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
> +
> +
> +C Language Extensions
> +_____________________
> +
> +
> +The following table lists the extensions currently used in Xen.
> +The table columns are as follows:
> +
> +   Extension
> +      a terse description of the extension;
> +   Architectures
> +      a set of Xen architectures making use of the extension;
> +   References
> +      when available, references to the documentation explaining
> +      the syntax and semantics of (each instance of) the extension.
> +
> +
> +.. list-table::
> +   :widths: 30 15 55
> +   :header-rows: 1
> +
> +   * - Extension
> +     - Architectures
> +     - References
> +
> +   * - Non-standard tokens
> +     - ARM64, X86_64
> +     - _Static_assert:
> +          see Section "2.1 C Language" of GCC_MANUAL.
> +       asm, __asm__:
> +          see Sections "6.48 Alternate Keywords" and "6.47 How to Use Inline 
> Assembly Language in C Code" of GCC_MANUAL.
> +       __volatile__:
> +          see Sections "6.48 Alternate Keywords" and "6.47.2.1 Volatile" of 
> GCC_MANUAL.
> +       __const__, __inline__, __inline:
> +          see Section "6.48 Alternate Keywords" of GCC_MANUAL.
> +       typeof, __typeof__:
> +          see Section "6.7 Referring to a Type with typeof" of GCC_MANUAL.
> +       __alignof__, __alignof:
> +          see Sections "6.48 Alternate Keywords" and "6.44 Determining the 
> Alignment of Functions, Types or Variables" of GCC_MANUAL.
> +       __attribute__:
> +          see Section "6.39 Attribute Syntax" of GCC_MANUAL.
> +       __builtin_types_compatible_p:
> +          see Section "6.59 Other Built-in Functions Provided by GCC" of 
> GCC_MANUAL.
> +       __builtin_va_arg:
> +          non-documented GCC extension.
> +       __builtin_offsetof:
> +          see Section "6.53 Support for offsetof" of GCC_MANUAL.
> +       __signed__:
> +          non-documented GCC extension.
> +
> +   * - Empty initialization list
> +     - ARM64, X86_64
> +     - Non-documented GCC extension.
> +
> +   * - Arithmetic operator on void type
> +     - ARM64, X86_64
> +     - See Section "6.24 Arithmetic on void- and Function-Pointers" of 
> GCC_MANUAL."
> +
> +   * - GNU statement expression

"GNU statement expression" is not very clear, at least for me. I would
call it "Statements and Declarations in Expressions".


> +     - ARM64, X86_64
> +     - See Section "6.1 Statements and Declarations in Expressions" of 
> GCC_MANUAL.
> +
> +   * - Structure or union definition with no members
> +     - ARM64, X86_64
> +     - See Section "6.19 Structures with No Members" of GCC_MANUAL.
> +
> +   * - Zero size array type
> +     - ARM64, X86_64
> +     - See Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
> +
> +   * - Binary conditional expression
> +     - ARM64, X86_64
> +     - See Section "6.8 Conditionals with Omitted Operands" of GCC_MANUAL.
> +
> +   * - 'Case' label with upper/lower values
> +     - ARM64, X86_64
> +     - See Section "6.30 Case Ranges" of GCC_MANUAL.
> +
> +   * - Unnamed field that is not a bit-field
> +     - ARM64, X86_64
> +     - See Section "6.63 Unnamed Structure and Union Fields" of GCC_MANUAL.
> +
> +   * - Empty declaration
> +     - ARM64, X86_64
> +     - Non-documented GCC extension.

For the non-documented GCC extensions, would it be possible to add a
very brief example or a couple of words in the "References" sections?
Otherwise I think people might not understand what we are talking about.

For instance in this case I would say:

An empty declaration is a semicolon with nothing before it.
Non-documented GCC extension.


> +   * - Incomplete enum declaration
> +     - ARM64
> +     - Non-documented GCC extension.

Is this 6.49 of the GCC manual perhaps?


> +   * - Implicit conversion from a pointer to an incompatible pointer
> +     - ARM64, X86_64
> +     - Non-documented GCC extension.

Is this related to -Wincompatible-pointer-types?


> +   * - Pointer to a function is converted to a pointer to an object or a 
> pointer to an object is converted to a pointer to a function
> +     - X86_64
> +     - Non-documented GCC extension.

Is this J.5.7 of n1570?
https://www.iso-9899.info/n1570.html

Or maybe we should link https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584


> +   * - Ill-formed source detected by the parser

As we are documenting compiler extensions that we are using, I am a bit
confused by the name of this category of compiler extensions, and the
reason why they are bundled together. After all, they are all separate
compiler extensions? Should each of them have their own row?


> +     - ARM64, X86_64
> +     - token pasting of ',' and __VA_ARGS__ is a GNU extension:
> +          see Section "6.21 Macros with a Variable Number of Arguments" of 
> GCC_MANUAL.
> +       must specify at least one argument for '...' parameter of variadic 
> macro:
> +          see Section "6.21 Macros with a Variable Number of Arguments" of 
> GCC_MANUAL.
> +       void function should not return void expression:

I understand that GCC does a poor job at documenting several of these
extensions. In fact a few of them are not even documented at all.
However, if they are extensions, they should be described for what they
do, not for the rule they violate. What do you think?

For example, in this case maybe we should say "void function can return
a void expression" ?


> +          see the documentation for -Wreturn-type in Section "3.8 Options to 
> Request or Suppress Warnings" of GCC_MANUAL.
> +       use of GNU statement expression extension from macro expansion:
> +          see Section "6.1 Statements and Declarations in Expressions" of 
> GCC_MANUAL.
> +       invalid application of sizeof to a void type:
> +          see Section "6.24 Arithmetic on void- and Function-Pointers" of 
> GCC_MANUAL.
> +       redeclaration of already-defined enum is a GNU extension:
> +          see Section "6.49 Incomplete enum Types" of GCC_MANUAL.
> +       static function is used in an inline function with external linkage:
> +          non-documented GCC extension.

I am not sure if I follow about this one. Did you mean "static is used
in an inline function with external linkage" ?


> +       struct may not be nested in a struct due to flexible array member:
> +          see Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
> +       struct may not be used as an array element due to flexible array 
> member:
> +          see Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
> +       ISO C restricts enumerator values to the range of int:
> +          non-documented GCC extension.

Should we call it instead "enumerator values can be larger than int" ?


> +
> +   * - Unspecified escape sequence is encountered in a character constant or 
> a string literal token
> +     - X86_64
> +     - \\m:
> +          non-documented GCC extension.

Are you saying that we are using \m and \m is not allowed by the C
standard?


> +   * - Non-standard type

Should we call it "128-bit Integers" ?


> +     - X86_64
> +     - See Section "6.9 128-bit Integers" of GCC_MANUAL.




> +Translation Limits
> +__________________
> +
> +The following table lists the translation limits that a toolchain has
> +to satisfy in order to translate Xen.  The numbers given are a
> +compromise: on the one hand, many modern compilers have very generous
> +limits (in several cases, the only limitation is the amount of
> +available memory); on the other hand we prefer setting limits that are
> +not too high, because compilers do not have any obligation of
> +diagnosing when a limit has been exceeded, and not too low, so as to
> +avoid frequently updating this document.  In the table, only the
> +limits that go beyond the minima specified by the relevant C Standard
> +are listed.
> +
> +The table columns are as follows:
> +
> +   Limit
> +      a terse description of the translation limit;
> +   Architectures
> +      a set relevant of Xen architectures;
> +   Threshold
> +      a value that the Xen project does not wish to exceed for that limit
> +      (this is typically below, often much below what the translation
> +      toolchain supports);
> +   References
> +      when available, references to the documentation providing evidence
> +      that the translation toolchain honors the threshold (and more).
> +
> +.. list-table::
> +   :widths: 30 15 10 45
> +   :header-rows: 1
> +
> +   * - Limit
> +     - Architectures
> +     - Threshold
> +     - References
> +
> +   * - Size of an object
> +     - ARM64, X86_64
> +     - 8388608
> +     - The maximum size of an object is defined in the MAX_SIZE macro, and 
> for a 32 bit architecture is 8MB.
> +       The maximum size for an array is defined in the PTRDIFF_MAX and in a 
> 32 bit architecture is 2^30-1.
> +       See occurrences of these macros in GCC_MANUAL.
> +
> +   * - Characters in one logical source line
> +     - ARM64
> +     - 5000
> +     - See Section "11.2 Implementation limits" of CPP_MANUAL.
> +
> +   * - Characters in one logical source line
> +     - X86_64
> +     - 12000
> +     - See Section "11.2 Implementation limits" of CPP_MANUAL.
> +
> +   * - Nesting levels for #include files
> +     - ARM64
> +     - 24
> +     - See Section "11.2 Implementation limits" of CPP_MANUAL.
> +
> +   * - Nesting levels for #include files
> +     - X86_64
> +     - 32
> +     - See Section "11.2 Implementation limits" of CPP_MANUAL.
> +
> +   * - case labels for a switch statement (excluding those for any nested 
> switch statements)
> +     - X86_64
> +     - 1500
> +     - See Section "4.12 Statements" of GCC_MANUAL.
> +
> +   * - Number of significant initial characters in an external identifier
> +     - ARM64, X86_64
> +     - 63
> +     - See Section "4.3 Identifiers" of GCC_MANUAL.
> +
> +
> +Implementation-Defined Behaviors
> +________________________________
> +
> +The following table lists the C language implementation-defined behaviors
> +relevant for MISRA C:2012 Dir 1.1 upon which Xen may possibly depend.
> +
> +The table columns are as follows:
> +
> +   I.-D.B.
> +      a terse description of the implementation-defined behavior;
> +   Architectures
> +      a set relevant of Xen architectures;
> +   Value(s)
> +      for i.-d.b.'s with values, the values allowed;
> +   References
> +      when available, references to the documentation providing details
> +      about how the i.-d.b. is resolved by the translation toolchain.
> +
> +.. list-table::
> +   :widths: 30 15 10 45
> +   :header-rows: 1
> +
> +   * - I.-D.B.
> +     - Architectures
> +     - Value(s)
> +     - References
> +
> +   * - Allowable bit-field types other than _Bool, signed int, and unsigned 
> int
> +     - ARM64, X86_64
> +     - All explicitly signed integer types, all unsigned integer types,
> +       and enumerations.
> +     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields".
> +
> +   * - #pragma preprocessing directive that is documented as causing 
> translation failure or some other form of undefined behavior is encountered
> +     - ARM64, X86_64
> +     - pack, GCC visibility
> +     - #pragma pack:
> +          see Section "6.62.11 Structure-Layout Pragmas" of GCC_MANUAL.
> +       #pragma GCC visibility:
> +          see Section "6.62.14 Visibility Pragmas" of GCC_MANUAL.
> +
> +   * - The number of bits in a byte
> +     - ARM64
> +     - 8
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data 
> types" of ARM64_ABI_MANUAL.
> +
> +   * - The number of bits in a byte
> +     - X86_64
> +     - 8
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data 
> Representation" of X86_64_ABI_MANUAL.
> +
> +   * - Whether signed integer types are represented using sign and 
> magnitude, two's complement, or one's complement, and whether the 
> extraordinary value is a trap representation or an ordinary value
> +     - ARM64, X86_64
> +     - Two's complement
> +     - See Section "4.5 Integers" of GCC_MANUAL.
> +
> +   * - Any extended integer types that exist in the implementation
> +     - X86_64
> +     - __uint128_t
> +     - See Section "6.9 128-bit Integers" of GCC_MANUAL.
> +
> +   * - The number, order, and encoding of bytes in any object
> +     - ARM64
> +     -
> +     - See Section "4.15 Architecture" of GCC_MANUAL and Chapter 5 "Data 
> types and alignment" of ARM64_ABI_MANUAL.
> +
> +   * - The number, order, and encoding of bytes in any object
> +     - X86_64
> +     -
> +     - See Section "4.15 Architecture" of GCC_MANUAL and Section "3.1.2 Data 
> Representation" of X86_64_ABI_MANUAL.
> +
> +   * - Whether a bit-field can straddle a storage-unit boundary
> +     - ARM64
> +     -
> +     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields of 
> GCC_MANUAL and Section "8.1.8 Bit-fields" of ARM64_ABI_MANUAL.
> +
> +   * - Whether a bit-field can straddle a storage-unit boundary
> +     - X86_64
> +     -
> +     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields" of 
> GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
> +
> +   * - The order of allocation of bit-fields within a unit
> +     - ARM64
> +     -
> +     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields of 
> GCC_MANUAL and Section "8.1.8 Bit-fields" of ARM64_ABI_MANUAL.
> +
> +   * - The order of allocation of bit-fields within a unit
> +     - X86_64
> +     -
> +     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields" of 
> GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
> +
> +   * - What constitutes an access to an object that has volatile-qualified 
> type
> +     - ARM64, X86_64
> +     -
> +     - See Section "4.10 Qualifiers" of GCC_MANUAL.
> +
> +   * - The values or expressions assigned to the macros specified in the 
> headers <float.h>, <limits.h>, and <stdint.h>
> +     - ARM64
> +     -
> +     - See Section "4.15 Architecture" of GCC_MANUAL and Chapter 5 "Data 
> types and alignment" of ARM64_ABI_MANUAL.
> +
> +   * - The values or expressions assigned to the macros specified in the 
> headers <float.h>, <limits.h>, and <stdint.h>
> +     - X86_64
> +     -
> +     - See Section "4.15 Architecture" of GCC_MANUAL and Section "3.1.2 Data 
> Representation" of X86_64_ABI_MANUAL.
> +
> +   * - Character not in the basic source character set is encountered in a 
> source file, except in an identifier, a character constant, a string literal, 
> a header name, a comment, or a preprocessing token that is never converted to 
> a token
> +     - ARM64
> +     - UTF-8
> +     - See Section "1.1 Character sets" of CPP_MANUAL.
> +       We assume the locale is not restricting any UTF-8 characters being 
> part of the source character set.
> +
> +   * - The value of a char object into which has been stored any character 
> other than a member of the basic execution character set
> +     - ARM64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data 
> types" of ARM64_ABI_MANUAL.
> +
> +   * - The value of a char object into which has been stored any character 
> other than a member of the basic execution character set
> +     - X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data 
> Representation" of X86_64_ABI_MANUAL.
> +
> +   * - The value of an integer character constant containing more than one 
> character or containing a character or escape sequence that does not map to a 
> single-byte execution character
> +     - ARM64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data 
> types" of ARM64_ABI_MANUAL.
> +
> +   * - The value of an integer character constant containing more than one 
> character or containing a character or escape sequence that does not map to a 
> single-byte execution character
> +     - X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data 
> Representation" of X86_64_ABI_MANUAL.
> +
> +   * - The mapping of members of the source character set
> +     - ARM64, X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and the documentation for 
> -finput-charset=charset in the same manual.
> +
> +   * - The members of the source and execution character sets, except as 
> explicitly specified in the Standard
> +     - ARM64, X86_64
> +     - UTF-8
> +     - See Section "4.4 Characters" of GCC_MANUAL
> +
> +   * - The values of the members of the execution character set
> +     - ARM64, X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and the documentation for 
> -fexec-charset=charset in the same manual.
> +
> +   * - How a diagnostic is identified
> +     - ARM64, X86_64
> +     -
> +     - See Section "4.1 Translation" of GCC_MANUAL.
> +
> +   * - The termination status returned to the host environment by the abort, 
> exit, or _Exit function
> +     - ARM64
> +     -
> +     - See "Section 25.7 Program Termination" of ARM64_LIBC_MANUAL.
> +
> +   * - The termination status returned to the host environment by the abort, 
> exit, or _Exit function
> +     - X86_64
> +     -
> +     - See "Section 25.7 Program Termination" of X86_64_LIBC_MANUAL.
> +
> +   * - The places that are searched for an included < > delimited header, 
> and how the places are specified or the header is identified
> +     - ARM64, X86_64
> +     -
> +     - See Chapter "2 Header Files" of CPP_MANUAL.
> +
> +   * - How the named source file is searched for in an included " " 
> delimited header
> +     - ARM64, X86_64
> +     -
> +     - See Chapter "2 Header Files" of CPP_MANUAL.
> +
> +   * - How sequences in both forms of header names are mapped to headers or 
> external source file names
> +     - ARM64, X86_64
> +     -
> +     - See Chapter "2 Header Files" of CPP_MANUAL.
> +
> +   * - Whether the # operator inserts a \ character before the \ character 
> that begins a universal character name in a character constant or string 
> literal
> +     - ARM64, X86_64
> +     -
> +     - See Section "3.4 Stringizing" of CPP_MANUAL.
> +
> +   * - The current locale used to convert a wide string literal into 
> corresponding wide character codes
> +     - ARM64, X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "11.1 
> Implementation-defined behavior" of CPP_MANUAL.
> +
> +   * - The value of a string literal containing a multibyte character or 
> escape sequence not represented in the execution character set
> +     - X86_64
> +     -
> +     - See Section "4.4 Characters" of GCC_MANUAL and Section "11.1 
> Implementation-defined behavior" of CPP_MANUAL.
> +
> +   * - The behavior on each recognized #pragma directive
> +     - ARM64, X86_64
> +     - pack, GCC visibility
> +     - See Section "4.13 Preprocessing Directives" of GCC_MANUAL and Section 
> "7 Pragmas" of CPP_MANUAL.
> +
> +   * - The method by which preprocessing tokens (possibly resulting from 
> macro expansion) in a #include directive are combined into a header name
> +     - X86_64
> +     -
> +     - See Section "4.13 Preprocessing Directives" of GCC_MANUAL and Section 
> "11.1 Implementation-defined behavior" of CPP_MANUAL.
> +
> +
> +END OF DOCUMENT.

END OF DOCUMENT is unnecessary

> -- 
> 2.34.1
> 



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.