[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL
On 22/01/2019 00:41, Stefano Stabellini wrote: > On Mon, 21 Jan 2019, Jan Beulich wrote: >>>>> On 19.01.19 at 00:05, <sstabellini@xxxxxxxxxx> wrote: >>> On Fri, 18 Jan 2019, Jan Beulich wrote: >>>>>>> On 18.01.19 at 02:24, <sstabellini@xxxxxxxxxx> wrote: >>>>> On Thu, 17 Jan 2019, Jan Beulich wrote: >>>>>>>>> On 17.01.19 at 01:37, <sstabellini@xxxxxxxxxx> wrote: >>>>>>> On Wed, 16 Jan 2019, Jan Beulich wrote: >>>>>>>> In any event - since intermediate variables merely hide the >>>>>>>> casting from the compiler, but they don't remove the casts, the >>>>>>>> solution involving casts is better imo, for incurring less overhead. >>>>>>> >>>>>>> This is where I completely disagree. The intermediate variables are not >>>>>>> hiding casts from the compiler. There were never any pointers in this >>>>>>> case. The linker creates "symbols", not pointers, completely invisible >>>>>>> from C land. Assembly uses these symbols to initialize variables. We >>>>>>> expose these assembly variables as integer to C lands. LD scripts and >>>>>>> assembly have their own terminology and rules: neither "_start" nor >>>>>>> "start" are pointers at any point in time. The operations done in var.S >>>>>>> is not a cast. The C spec is happy, the compiler is happy, MISRA-C is >>>>>>> happy. And we get to avoid the ugly SYMBOL macro that Linux uses. It is >>>>>>> really a win-win. >>>>>> >>>>>> Well, that's a position one can take. But we have to settle on another >>>>>> aspect then first: Does what is not done in C underly C's rules? I >>>>>> thought you were of the opinion that what comes from linker scripts >>>>>> does. In which case what comes from assembly files ought to, too. >>>>>> (FAOD my implication is: If the answer is yes, both approaches >>>>>> violate C's rules. If the answer is no, no change is needed at all.) >>>>> >>>>> Great question, that is the core of the issue. Also, let me premise that >>>>> I agree on the comments you made on the patches (I dislike "start_" >>>>> too), and I can address them if we agree to continue down this path. >>>>> >>>>> But no, I do not think that what is done outside of C-land should follow >>>>> C rules. But I do not agree with your conclusion that in that case there >>>>> is no difference between the approaches. Let's get more into the >>>>> details. >>>>> >>>>> >>>>> 1) SYMBOL_HIDE returning pointer type >>>>> >>>>> Let's take _start and _end as an example. _start is born as a linker >>>>> symbol, and it becomes a C pointer when we do: >>>>> >>>>> extern char _start[], _end[] >>>>> >>>>> Now it is a pointer (actually I should say an array, but let's pretend >>>>> they are the same thing for this discussion). >>>>> >>>>> When we do: >>>>> >>>>> SYMBOL_HIDE(_end) - SYMBOL_HIDE(_start) >>>>> >>>>> We are still subtracting pointers: the pointers returned by SYMBOL_HIDE. >>>>> We cannot prove that they are pointers to the same object or subsequence >>>>> objects in memory, so it is undefined behavior, which is not allowed. >>>> >>>> Stop. No. We very much can prove they are - _end points at >>>> one past the last element of _start[]. It is the compiler which >>>> can't prove the opposite, and hence it can't leverage >>>> undefined behavior for optimization purposes. >>> >>> This is an interesting comment. However, even for normal pointers it is >>> unreliable to count on one pointing one past the last element of the >>> other. This was well explained in the GCC thread linked earlier in this >>> thread. The vision of at least one of the GCC maintainers is that the >>> compiler is free to place things in memory where it wishes, so as a >>> programmer you cannot count on pointers pointing one past the last >>> element of the other. Ever. In this case, where _start and _end are >>> defined outside of C-land, I think it is even more true, and it remains >>> undefined. >> >> You mix up two things: One is the chance of two objects being >> adjacent to one another. We don't care about this. The other is >> a pointer truly pointing one past the last element of an array (as >> will naturally result with e.g. >> >> for ( ptr = arr; ptr < arr + ARRAY_SIZE(arr); ++ptr ) >> >> It is this second case which all the cases we care about here fall >> into. As per my other mail, just like the same object can have multiple >> names, symbols may also refer to places other than the start of >> an object; the fact that plain C can't produce such symbols is not >> relevant as long as there's no requirement that C code may >> interface only with other C code. > > The chance of two objects being adjacent to one another is relevant > because the compiler could rightfully decide that the programmer can > never rely on pointers pointing one past the last element of the other, > even if they truly point one past the last element of an array. > > Otherwise, Linux would have never needed to introduce RELOC_HIDE in the > first place. > > >>> Moreover, I went back to MISRAC (finally I have a copy) and rule 18.2 >>> says: "subtraction between pointers shall only be applied to pointers >>> that address elements of the same array". So, all the evidence we have >>> seems to say that we cannot rely on _end pointing one past the last >>> element of _start in this matter. >> >> With the C standard's wording in mind, this surely is to include >> the "one past the last element" case, in which case all is fine. _end >> does not point at or into a different object, it points at the end of >> _start[]. >> >>>>> 3) var.S + start_ as unsigned long >>>>> >>>>> With this approach, _start is born as a linker symbol. It is never >>>>> exported to C, so from C point of view, it doesn't exist. There is >>>>> another variable named "start_" defined in assembly and initialized to >>>>> _start. Now we go into C land with: >>>>> >>>>> extern uintptr_t start_, end_ >>>>> >>>>> start_ and end_ are uintptr_t from the beginning from C point of view. >>>>> They have never been pointers or in any way connected to _start. They >>>>> are "clean". >>>>> >>>>> When we do: >>>>> >>>>> _end - _start >>>>> >>>>> it is a subtraction between uintptr_t, which is allowed. When we do: >>>>> >>>>> for ( call = (const initcall_t *)initcall_start_; >>>>> (uintptr_t)call < presmp_initcall_end_; >>>>> >>>>> The comparison is still between uintptr_t types, and the value of "call" >>>>> still comes from an unsigned long initially. There is never a comparison >>>>> between dubious pointers. (Interger to pointer conversions and pointer >>>>> to integer conversions are allowed by MISRA with some limitations, but I >>>>> am double-checking.) Even: >>>>> >>>>> (uintptr_t)random_pointer < presmp_initcall_end_ >>>>> >>>>> would be acceptable because presmp_initcall_end_ is an integer and has >>>>> always been an integer from C point of view. >>>> >>>> Well, as said - this is one of the possible positions to take. Personally >>>> I see no difference between the helper symbols defined in >>>> assembly sources, or in C sources the object files for which are never >>>> made part of potential whole program optimization. >>> >>> I don't think this is the case for MISRAC. C rules apply to C. Other >>> rules apply to assembly and linker scripts. This is something that >>> should be easy to check, and I hope that Stewart should be able to >>> confirm. >> >> As per above - the interesting aspect is what rules apply to the >> case of C interfacing with another language. >> >>>> Using C files for this is still in conflict with the supposed >>>> undefined behavior, but I think you agree that C and assembly files >>>> could be set up such that the resulting binary data is identical. In >>>> which case it is bogus to call one satisfactory, but not the other. >>> >>> I see what you are saying, but it doesn't work that way from a spec >>> compliance point of view. >>> >>> >>>>> However, there are still a couple of issued not correctly solved by v8 >>>>> of the series. For starters: >>>>> >>>>> apply_alternatives((struct alt_instr *)alt_instructions_, >>>>> (struct alt_instr *)alt_instructions_end_); >>>>> >>>>> I can see how the pointers comparisons in apply_alternatives could be >>>>> considered wrong given the way the pointers are initialized: >>>>> >>>>> for ( a = base = start; a < end; a++ ) >>>>> { >>>>> >>>>> start and end come from alt_instructions_ and alt_instructions_end_. It >>>>> doesn't matter that alt_instructions_ and alt_instructions_end_ are >>>>> "special", they could be perfectly normal integers and we would still >>>>> have the same problem: we cannot prove that "start" and "end" point to >>>>> the same object or subsequent objects in memory. >>>>> >>>>> The way to fix it is by changing the parameters of apply_alternatives to >>>>> interger types, making comparison between integers, and only using >>>>> pointers to access the data. >>>> >>>> You know my position on casts from integer to pointer types, especially >>>> ones taking a type out of thin air. This applies to your addition to the >>>> apply_alternatives() construct as well as the alternative of adding such >>>> in order to access memory. The quote from the standard that I gave >>>> makes such casts not provably (by the compiler) defined behavior as >>>> well, so it all boils down to the same distinction as pointed out above in >>>> the first part of my reply here: _We_ can prove it, but the compiler >>>> can't. Hence we're still depending on whose proof is necessary to >>>> eliminate MISRA's undefined behavior concerns. >>> >>> Comparisons between pointers to different objects is undefined by the C >>> spec, and not allowed by MISRAC. >>> >>> Casting pointers to integers and casting integers to pointers is >>> implementation-defined, which is not the same thing as undefined. >> >> Of course it is not, but the result possibly not even being a valid >> pointer can't make it much better than "undefined". >> >>> I don't make up the rules, I am only trying to follow them :-) >> >> Sure. But we shouldn't uglify our code just to follow insane >> (exaggeration intended) rules. > > We haven't managed to reach consensus on this topic. Your view might be > correct, but it is not necessarily supported by compilers' behavior, > which depends on the opinion of compilers engineers on the topic, and > MISRAC compliance, which depends on the opinion of MISRAC specialists on > the topic. If we take your suggested approach we end up with the code > most likely to break in case the compilers engineers or the MISRAC > experts disagree with you. In this case, being right doesn't necessarily > lead to the code less likely to break. > > Regardless, if that is the decision of the Xen community as a whole, > I'll follow it. My preference remains with approach 3. (var.S), followed > by approach 2. (SYMBOL_HIDE returns uintptr_t), but I am willing to > refresh my series to do approach 1. (SYMBOL_HIDE returns pointer type) > if that is the only way forward. > > Let us come to a conclusion so that we can move on. > > Jan, Julien, Juergen, anybody else interested, let me know what you want > me to do. > I am "only" the release manager, so I can opt for the series to go into 4.12 in case the committers are ready to give it a go. The decision for 4.12 depends on the time consensus is reached. Right now I'd give it my Rab, but in case some more weeks are needed I might not want to take the risk. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |