[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL



On Mon, 21 Jan 2019, Jan Beulich wrote:
> >>> On 19.01.19 at 00:05, <sstabellini@xxxxxxxxxx> wrote:
> > On Fri, 18 Jan 2019, Jan Beulich wrote:
> >> >>> On 18.01.19 at 02:24, <sstabellini@xxxxxxxxxx> wrote:
> >> > On Thu, 17 Jan 2019, Jan Beulich wrote:
> >> >> >>> On 17.01.19 at 01:37, <sstabellini@xxxxxxxxxx> wrote:
> >> >> > On Wed, 16 Jan 2019, Jan Beulich wrote:
> >> >> >> In any event - since intermediate variables merely hide the
> >> >> >> casting from the compiler, but they don't remove the casts, the
> >> >> >> solution involving casts is better imo, for incurring less overhead.
> >> >> > 
> >> >> > This is where I completely disagree. The intermediate variables are 
> >> >> > not
> >> >> > hiding casts from the compiler. There were never any pointers in this
> >> >> > case.  The linker creates "symbols", not pointers, completely 
> >> >> > invisible
> >> >> > from C land. Assembly uses these symbols to initialize variables. We
> >> >> > expose these assembly variables as integer to C lands. LD scripts and
> >> >> > assembly have their own terminology and rules: neither "_start" nor
> >> >> > "start" are pointers at any point in time. The operations done in 
> >> >> > var.S
> >> >> > is not a cast. The C spec is happy, the compiler is happy, MISRA-C is
> >> >> > happy. And we get to avoid the ugly SYMBOL macro that Linux uses. It 
> >> >> > is
> >> >> > really a win-win.
> >> >> 
> >> >> Well, that's a position one can take. But we have to settle on another
> >> >> aspect then first: Does what is not done in C underly C's rules? I
> >> >> thought you were of the opinion that what comes from linker scripts
> >> >> does. In which case what comes from assembly files ought to, too.
> >> >> (FAOD my implication is: If the answer is yes, both approaches
> >> >> violate C's rules. If the answer is no, no change is needed at all.)
> >> > 
> >> > Great question, that is the core of the issue. Also, let me premise that
> >> > I agree on the comments you made on the patches (I dislike "start_"
> >> > too), and I can address them if we agree to continue down this path.
> >> > 
> >> > But no, I do not think that what is done outside of C-land should follow
> >> > C rules. But I do not agree with your conclusion that in that case there
> >> > is no difference between the approaches. Let's get more into the
> >> > details.
> >> > 
> >> > 
> >> > 1) SYMBOL_HIDE returning pointer type
> >> > 
> >> > Let's take _start and _end as an example. _start is born as a linker
> >> > symbol, and it becomes a C pointer when we do:
> >> > 
> >> >   extern char _start[], _end[]
> >> > 
> >> > Now it is a pointer (actually I should say an array, but let's pretend
> >> > they are the same thing for this discussion).
> >> > 
> >> > When we do:
> >> > 
> >> >   SYMBOL_HIDE(_end) - SYMBOL_HIDE(_start)
> >> > 
> >> > We are still subtracting pointers: the pointers returned by SYMBOL_HIDE.
> >> > We cannot prove that they are pointers to the same object or subsequence
> >> > objects in memory, so it is undefined behavior, which is not allowed.
> >> 
> >> Stop. No. We very much can prove they are - _end points at
> >> one past the last element of _start[]. It is the compiler which
> >> can't prove the opposite, and hence it can't leverage
> >> undefined behavior for optimization purposes.
> > 
> > This is an interesting comment. However, even for normal pointers it is
> > unreliable to count on one pointing one past the last element of the
> > other. This was well explained in the GCC thread linked earlier in this
> > thread. The vision of at least one of the GCC maintainers is that the
> > compiler is free to place things in memory where it wishes, so as a
> > programmer you cannot count on pointers pointing one past the last
> > element of the other. Ever. In this case, where _start and _end are
> > defined outside of C-land, I think it is even more true, and it remains
> > undefined.
> 
> You mix up two things: One is the chance of two objects being
> adjacent to one another. We don't care about this. The other is
> a pointer truly pointing one past the last element of an array (as
> will naturally result with e.g.
> 
>     for ( ptr = arr; ptr < arr + ARRAY_SIZE(arr); ++ptr )
> 
> It is this second case which all the cases we care about here fall
> into. As per my other mail, just like the same object can have multiple
> names, symbols may also refer to places other than the start of
> an object; the fact that plain C can't produce such symbols is not
> relevant as long as there's no requirement that C code may
> interface only with other C code.

The chance of two objects being adjacent to one another is relevant
because the compiler could rightfully decide that the programmer can
never rely on pointers pointing one past the last element of the other,
even if they truly point one past the last element of an array.

Otherwise, Linux would have never needed to introduce RELOC_HIDE in the
first place.


> > Moreover, I went back to MISRAC (finally I have a copy) and rule 18.2
> > says: "subtraction between pointers shall only be applied to pointers
> > that address elements of the same array". So, all the evidence we have
> > seems to say that we cannot rely on _end pointing one past the last
> > element of _start in this matter.
> 
> With the C standard's wording in mind, this surely is to include
> the "one past the last element" case, in which case all is fine. _end
> does not point at or into a different object, it points at the end of
> _start[].
>
> >> > 3) var.S + start_ as unsigned long
> >> > 
> >> > With this approach, _start is born as a linker symbol. It is never
> >> > exported to C, so from C point of view, it doesn't exist. There is
> >> > another variable named "start_" defined in assembly and initialized to
> >> > _start. Now we go into C land with:
> >> > 
> >> >   extern uintptr_t start_, end_
> >> > 
> >> > start_ and end_ are uintptr_t from the beginning from C point of view.
> >> > They have never been pointers or in any way connected to _start. They
> >> > are "clean".
> >> > 
> >> > When we do:
> >> > 
> >> >   _end - _start
> >> > 
> >> > it is a subtraction between uintptr_t, which is allowed. When we do:
> >> > 
> >> >     for ( call = (const initcall_t *)initcall_start_;
> >> >           (uintptr_t)call < presmp_initcall_end_;
> >> > 
> >> > The comparison is still between uintptr_t types, and the value of "call"
> >> > still comes from an unsigned long initially. There is never a comparison
> >> > between dubious pointers. (Interger to pointer conversions and pointer
> >> > to integer conversions are allowed by MISRA with some limitations, but I
> >> > am double-checking.) Even:
> >> > 
> >> >    (uintptr_t)random_pointer < presmp_initcall_end_
> >> > 
> >> > would be acceptable because presmp_initcall_end_ is an integer and has
> >> > always been an integer from C point of view.
> >> 
> >> Well, as said - this is one of the possible positions to take. Personally
> >> I see no difference between the helper symbols defined in
> >> assembly sources, or in C sources the object files for which are never
> >> made part of potential whole program optimization. 
> > 
> > I don't think this is the case for MISRAC. C rules apply to C. Other
> > rules apply to assembly and linker scripts. This is something that
> > should be easy to check, and I hope that Stewart should be able to
> > confirm.
> 
> As per above - the interesting aspect is what rules apply to the
> case of C interfacing with another language.
> 
> >> Using C files for this is still in conflict with the supposed
> >> undefined behavior, but I think you agree that C and assembly files
> >> could be set up such that the resulting binary data is identical. In
> >> which case it is bogus to call one satisfactory, but not the other.
> > 
> > I see what you are saying, but it doesn't work that way from a spec
> > compliance point of view.
> > 
> > 
> >> > However, there are still a couple of issued not correctly solved by v8
> >> > of the series. For starters: 
> >> > 
> >> >         apply_alternatives((struct alt_instr *)alt_instructions_,
> >> >                            (struct alt_instr *)alt_instructions_end_);
> >> > 
> >> > I can see how the pointers comparisons in apply_alternatives could be
> >> > considered wrong given the way the pointers are initialized:
> >> > 
> >> >     for ( a = base = start; a < end; a++ )
> >> >     {
> >> > 
> >> > start and end come from alt_instructions_ and alt_instructions_end_. It
> >> > doesn't matter that alt_instructions_ and alt_instructions_end_ are
> >> > "special", they could be perfectly normal integers and we would still
> >> > have the same problem: we cannot prove that "start" and "end" point to
> >> > the same object or subsequent objects in memory.
> >> > 
> >> > The way to fix it is by changing the parameters of apply_alternatives to
> >> > interger types, making comparison between integers, and only using
> >> > pointers to access the data.
> >> 
> >> You know my position on casts from integer to pointer types, especially
> >> ones taking a type out of thin air. This applies to your addition to the
> >> apply_alternatives() construct as well as the alternative of adding such
> >> in order to access memory. The quote from the standard that I gave
> >> makes such casts not provably (by the compiler) defined behavior as
> >> well, so it all boils down to the same distinction as pointed out above in
> >> the first part of my reply here: _We_ can prove it, but the compiler
> >> can't. Hence we're still depending on whose proof is necessary to
> >> eliminate MISRA's undefined behavior concerns.
> > 
> > Comparisons between pointers to different objects is undefined by the C
> > spec, and not allowed by MISRAC.
> > 
> > Casting pointers to integers and casting integers to pointers is
> > implementation-defined, which is not the same thing as undefined.
> 
> Of course it is not, but the result possibly not even being a valid
> pointer can't make it much better than "undefined".
>
> > I don't make up the rules, I am only trying to follow them :-)
> 
> Sure. But we shouldn't uglify our code just to follow insane
> (exaggeration intended) rules.

We haven't managed to reach consensus on this topic. Your view might be
correct, but it is not necessarily supported by compilers' behavior,
which depends on the opinion of compilers engineers on the topic, and
MISRAC compliance, which depends on the opinion of MISRAC specialists on
the topic. If we take your suggested approach we end up with the code
most likely to break in case the compilers engineers or the MISRAC
experts disagree with you. In this case, being right doesn't necessarily
lead to the code less likely to break.

Regardless, if that is the decision of the Xen community as a whole,
I'll follow it. My preference remains with approach 3. (var.S), followed
by approach 2. (SYMBOL_HIDE returns uintptr_t), but I am willing to
refresh my series to do approach 1. (SYMBOL_HIDE returns pointer type)
if that is the only way forward.

Let us come to a conclusion so that we can move on.

Jan, Julien, Juergen, anybody else interested, let me know what you want
me to do.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.