[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6 1/4] xen: introduce SYMBOL



On Friday, January 18, 2019 6:05 PM, Stefano Stabellini wrote:
> On Fri, 18 Jan 2019, Jan Beulich wrote:
> > >>> On 18.01.19 at 02:24, <sstabellini@xxxxxxxxxx> wrote:
> > > On Thu, 17 Jan 2019, Jan Beulich wrote:
> > >> >>> On 17.01.19 at 01:37, <sstabellini@xxxxxxxxxx> wrote:
> > >> > On Wed, 16 Jan 2019, Jan Beulich wrote:
> > >> >> In any event - since intermediate variables merely hide the
> > >> >> casting from the compiler, but they don't remove the casts, the
> > >> >> solution involving casts is better imo, for incurring less overhead.
> > >> >
> > >> > This is where I completely disagree. The intermediate variables are not
> > >> > hiding casts from the compiler. There were never any pointers in this
> > >> > case.  The linker creates "symbols", not pointers, completely invisible
> > >> > from C land. Assembly uses these symbols to initialize variables. We
> > >> > expose these assembly variables as integer to C lands. LD scripts and
> > >> > assembly have their own terminology and rules: neither "_start" nor
> > >> > "start" are pointers at any point in time. The operations done in var.S
> > >> > is not a cast. The C spec is happy, the compiler is happy, MISRA-C is
> > >> > happy. And we get to avoid the ugly SYMBOL macro that Linux uses. It is
> > >> > really a win-win.
> > >>
> > >> Well, that's a position one can take. But we have to settle on another
> > >> aspect then first: Does what is not done in C underly C's rules? I
> > >> thought you were of the opinion that what comes from linker scripts
> > >> does. In which case what comes from assembly files ought to, too.
> > >> (FAOD my implication is: If the answer is yes, both approaches
> > >> violate C's rules. If the answer is no, no change is needed at all.)
> > >
> > > Great question, that is the core of the issue. Also, let me premise that
> > > I agree on the comments you made on the patches (I dislike "start_"
> > > too), and I can address them if we agree to continue down this path.
> > >
> > > But no, I do not think that what is done outside of C-land should follow
> > > C rules. But I do not agree with your conclusion that in that case there
> > > is no difference between the approaches. Let's get more into the
> > > details.
> > >
> > >
> > > 1) SYMBOL_HIDE returning pointer type
> > >
> > > Let's take _start and _end as an example. _start is born as a linker
> > > symbol, and it becomes a C pointer when we do:
> > >
> > >   extern char _start[], _end[]
> > >
> > > Now it is a pointer (actually I should say an array, but let's pretend
> > > they are the same thing for this discussion).
> > >
> > > When we do:
> > >
> > >   SYMBOL_HIDE(_end) - SYMBOL_HIDE(_start)
> > >
> > > We are still subtracting pointers: the pointers returned by SYMBOL_HIDE.
> > > We cannot prove that they are pointers to the same object or subsequence
> > > objects in memory, so it is undefined behavior, which is not allowed.
> >
> > Stop. No. We very much can prove they are - _end points at
> > one past the last element of _start[]. It is the compiler which
> > can't prove the opposite, and hence it can't leverage
> > undefined behavior for optimization purposes.
> 
> This is an interesting comment. However, even for normal pointers it is
> unreliable to count on one pointing one past the last element of the
> other. This was well explained in the GCC thread linked earlier in this
> thread. The vision of at least one of the GCC maintainers is that the
> compiler is free to place things in memory where it wishes, so as a
> programmer you cannot count on pointers pointing one past the last
> element of the other. Ever. In this case, where _start and _end are
> defined outside of C-land, I think it is even more true, and it remains
> undefined.
> 
> Moreover, I went back to MISRAC (finally I have a copy) and rule 18.2
> says: "subtraction between pointers shall only be applied to pointers
> that address elements of the same array". So, all the evidence we have
> seems to say that we cannot rely on _end pointing one past the last
> element of _start in this matter.
> 
> 
> > > 3) var.S + start_ as unsigned long
> > >
> > > With this approach, _start is born as a linker symbol. It is never
> > > exported to C, so from C point of view, it doesn't exist. There is
> > > another variable named "start_" defined in assembly and initialized to
> > > _start. Now we go into C land with:
> > >
> > >   extern uintptr_t start_, end_
> > >
> > > start_ and end_ are uintptr_t from the beginning from C point of view.
> > > They have never been pointers or in any way connected to _start. They
> > > are "clean".
> > >
> > > When we do:
> > >
> > >   _end - _start
> > >
> > > it is a subtraction between uintptr_t, which is allowed. When we do:
> > >
> > >     for ( call = (const initcall_t *)initcall_start_;
> > >           (uintptr_t)call < presmp_initcall_end_;
> > >
> > > The comparison is still between uintptr_t types, and the value of "call"
> > > still comes from an unsigned long initially. There is never a comparison
> > > between dubious pointers. (Interger to pointer conversions and pointer
> > > to integer conversions are allowed by MISRA with some limitations, but I
> > > am double-checking.) Even:
> > >
> > >    (uintptr_t)random_pointer < presmp_initcall_end_
> > >
> > > would be acceptable because presmp_initcall_end_ is an integer and has
> > > always been an integer from C point of view.
> >
> > Well, as said - this is one of the possible positions to take. Personally
> > I see no difference between the helper symbols defined in
> > assembly sources, or in C sources the object files for which are never
> > made part of potential whole program optimization.
> 
> I don't think this is the case for MISRAC. C rules apply to C. Other
> rules apply to assembly and linker scripts. This is something that
> should be easy to check, and I hope that Stewart should be able to
> confirm.

Would it help to provide a guarantee that during processing of one
compilation unit, the compiler doesn't have visibility into other
compilation units or object files?

With GCC, we have the luxury of being able to specify no link time
optimization and no whole program optimization. This could also involve
one of -fno-lto, -fno-whole-program, or both.

We should also specify to invoke the compiler separately for each .c file
(i.e. don't do "gcc -c foo.c bar.c", rather they should be separate steps
"gcc -c foo.c" and "gcc -c bar.c").

Can we agree that this would give us a guarantee C land is separate from
assembly and linker lands?

I have not investigated clang, but we should make sure we can provide this
guarantee for clang as well.

With those guarantees in place, can we agree that what happens in an
assembly source file is not subject to the potential undefined pointers to
different objects behavior described in C99 section 6.5.6 and 6.5.8, and
the "if and only if" clause in 6.5.9? (I'm not talking about inline
assembly).

If we wanted to achieve a more warm and fuzzy feeling about this, we could
investigate potentially invoking "as" or "gcc -xassembly" for translating
an assembly source to object code (disclaimer: I didn't check to see if
this is already being done or not).

C99 footnote 56 gives us a hint about the intent as it relates to
execution environments, which reminded me: we might also consider
specifying certain requirements for the execution environment. For
example: "pointers shall be meaningfully able to be represented in integer
types" and/or "compare/subtract operations on pointers shall yield
meaningful results" or something like that (these probably could use some
word-smithing). I do believe we're already making certain assumptions
about memory addressing and execution environment - we should spell out
the assumptions clearly and specify that our choice of compilers, allowed
compiler options, and architectures don't violate the assumptions.


> > Using C files for this is still in conflict with the supposed
> > undefined behavior, but I think you agree that C and assembly files
> > could be set up such that the resulting binary data is identical. In
> > which case it is bogus to call one satisfactory, but not the other.
> 
> I see what you are saying, but it doesn't work that way from a spec
> compliance point of view.
> 
> 
> > > However, there are still a couple of issued not correctly solved by v8
> > > of the series. For starters:
> > >
> > >         apply_alternatives((struct alt_instr *)alt_instructions_,
> > >                            (struct alt_instr *)alt_instructions_end_);
> > >
> > > I can see how the pointers comparisons in apply_alternatives could be
> > > considered wrong given the way the pointers are initialized:
> > >
> > >     for ( a = base = start; a < end; a++ )
> > >     {
> > >
> > > start and end come from alt_instructions_ and alt_instructions_end_. It
> > > doesn't matter that alt_instructions_ and alt_instructions_end_ are
> > > "special", they could be perfectly normal integers and we would still
> > > have the same problem: we cannot prove that "start" and "end" point to
> > > the same object or subsequent objects in memory.
> > >
> > > The way to fix it is by changing the parameters of apply_alternatives to
> > > interger types, making comparison between integers, and only using
> > > pointers to access the data.
> >
> > You know my position on casts from integer to pointer types, especially
> > ones taking a type out of thin air. This applies to your addition to the
> > apply_alternatives() construct as well as the alternative of adding such
> > in order to access memory. The quote from the standard that I gave
> > makes such casts not provably (by the compiler) defined behavior as
> > well, so it all boils down to the same distinction as pointed out above in
> > the first part of my reply here: _We_ can prove it, but the compiler
> > can't. Hence we're still depending on whose proof is necessary to
> > eliminate MISRA's undefined behavior concerns.
> 
> Comparisons between pointers to different objects is undefined by the C
> spec, and not allowed by MISRAC.
> 
> Casting pointers to integers and casting integers to pointers is
> implementation-defined, which is not the same thing as undefined.
> 
> Specifically, casting integers to pointers and pointers to integers is
> allowed by MISRAC with the caveat that we should avoid misaligned
> pointers (char* are always allowed), and that a compatible pointer type
> is used when accessing the object (char* is always compatible). Stewart
> will send a longer explanation over the weekend.
> 
> I don't make up the rules, I am only trying to follow them :-)

I'll get to that in a bit, but first, it's time for another radical new
idea. Let's call it approach number 4.

The undefined behavior and "if and only if" clause (C99 6.5.6/8/9) only
pertain to the subtract/compare operators. So, if we don't use the
subtract/compare operators in C land, we won't be subject to the undefined
behavior. Let's move the pointer subtract/compare operations to assembly.
Not inline assembly, but to a separate assembly source file.

We would write subroutines in assembly (callable from C) for each
subtract/compare operation required. For example:
char * subtract_ptr_ptr(char *, char *);
char * subtract_ptr_int(char *, uintptr_t);
int test_equal(char *, char *);

That could even open up the door for common operations like "_end - _start":
size_t get_program_size(void);

If we can prove to the compiler that we're subtracting/comparing pointers
to the same object, or one element past the last, then we're still OK to
use the subtract/compare operators. Otherwise, call these functions.

This approach relies on being able to provide some or all of the
guarantees discussed above.

Do you think this will prevent GCC from doing its code-breaking
optimization in questions and help with MISRA C?


Lastly, back to the casts question (though it may be irrelevant if we
choose the approach I just outlined): the C standard guarantees that you
can reliably convert a void pointer to uintptr_t and back (C99 section
7.18.1.4). This is fully defined by the C standard: no unspecified,
implementation-defined, or undefined behavior about that. It does not make
the same guarantee for other pointer types. Rather, conversion between
pointer types (other than void*) and integers is implementation defined
(C99 section 6.3.2.3 paragraphs 5 and 6). Further, converting any pointer
type (except function pointer types) to a "void *" and back is not lossy
(C99 section 6.3.2.3 paragraph 1).

So, let's say you have a "char *" that you want to convert to uintptr_t,
you'd first have to convert to "void *".

char * im_a_char_ptr;
uintptr_t im_a_uintptr_t;
/* ... initialization ... */
im_a_uintptr_t = (uintptr_t)(void*)im_a_char_ptr;
im_a_char_ptr = (char*)(void*)im_a_uintptr_t;

It may not be pretty, and I fully sympathize with your resistance toward
unnecessary casts, but we have a fully C99 standard compliant way to
convert between uintptr_t and pointer types and back without loss, and
without relying on unspecified, implementation-defined, or undefined
behavior.

MISRA C advises that you shouldn't do such casting, but recognizes that it
is necessary in some cases, so it gives guidelines for the case when an
integer type is converted to a pointer type:
1. Take care to avoid misaligned pointers ("char *" will always be
   aligned, assuming certain properties of the execution environment)
2. Ensure that a compatible pointer type is used when accessing the object
   ("char *" is always guaranteed to be compatible)

I refer you to C99 section 6.5 paragraph 7, and MISRA C:2012 Rules 11.3,
11.4, 11.5 for further details.

Stew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.