[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Fix error: array subscript has type 'char'



On Thu, Jan 14, 2021 at 03:16:15PM +0100, Manuel Bouyer wrote:
> On Thu, Jan 14, 2021 at 02:25:05PM +0100, Jan Beulich wrote:
> > On 14.01.2021 13:29, Manuel Bouyer wrote:
> > > On Thu, Jan 14, 2021 at 11:53:20AM +0100, Jan Beulich wrote:
> > >> On 12.01.2021 19:12, Manuel Bouyer wrote:
> > >>> From: Manuel Bouyer <bouyer@xxxxxxxxxx>
> > >>>
> > >>> Use unsigned char variable, or cast to (unsigned char), for
> > >>> tolower()/islower() and friends. Fix compiler error
> > >>> array subscript has type 'char' [-Werror=char-subscripts]
> > >>
> > >> Isn't this something that wants changing in your ctype.h instead?
> > >> the functions (or macros), as per the C standard, ought to accept
> > >> plain char aiui, and if they use the input as an array subscript,
> > >> it should be their implementation suitably converting type first.
> > > 
> > > I asked for inputs from NetBSD developers familiar with this part.
> > > 
> > > Although the parameter is an int, only a subset of values is valid,
> > > as stated in ISO C 2018 (Section 7.4 paragrah 1):
> > >> In all cases the argument is an int, the value of which shall be
> > >> representable as an unsigned char or shall equal the value of the
> > >> macro EOF.  If the argument has any other value, the behavior is 
> > >> undefined.                               
> > 
> > Which means you're shifting the undefined-ness from the implementation
> > (using the value as array index) to the callers (truncating values, or
> > converting value range). In particular I do not think that the
> > intention behind the standard's wording is for every caller to need to
> > cast to unsigned char, when e.g. character literals are of type char
> > and string literals are of type const char[].
> 
> If you don't cast you fall into the undefined behavior case for non-ascii
> characters. For example, "é" in iso-latin-1 is 0xe9. In a signed char, this is
> -23 (decimal). without the cast, tolower() will see -23.
> If it is casted to (unsigned char) first, tolower() will see 233, as expected.
> The following test program illustrates this:
> armandeche:/tmp>cat toto.c
> #include <stdio.h>
> 
> int
> main(int _c, const char *_v[])
> {
>         char c = 0xe9;
>       printf("%d %d\n", (int)c, (int)(unsigned char)c);
> }
> armandeche:/tmp>./toto 
> -23 233
> 
> 
> 
> > 
> > > As stated by NetBSD's ctype(3) manual page, NetBSD and glibc took 
> > > different
> > > approach. NetBSD emits a compile-time warning if the input may lead to
> > > undefined behavior. quoting the man page:
> > >      Some implementations of libc, such as glibc as of 2018, attempt to 
> > > avoid
> > >      the worst of the undefined behavior by defining the functions to 
> > > work for
> > >      all integer inputs representable by either unsigned char or char, and
> > >      suppress the warning.  However, this is not an excuse for avoiding
> > >      conversion to unsigned char: if EOF coincides with any such value, 
> > > as it
> > >      does when it is -1 on platforms with signed char, programs that pass 
> > > char
> > >      will still necessarily confuse the classification and mapping of EOF 
> > > with
> > >      the classification and mapping of some non-EOF inputs.
> > > 
> > > 
> > > So, although no warning is emmited on linux, it looks like to me that the
> > > cast to unsigned char is needed anyway, and relying on glibc's behavior
> > > is not portable.
> > 
> > I fully agree we shouldn't rely on glibc's behavior (I'm sure
> > you've looked at xen/include/xen/ctype.h to see how we address
> > this it Xen itself, but I will admit this is to a degree comparing
> > apples and oranges, not the least because the lack of a need to
> > consider EOF in Xen). At least in xen/tools/symbols.c I don't
> > think we're at risk of running into "undefined" space. Casts are
> 
> as long as there's only ascii characters.
> 
> Anyway NetBSD won't change its ctype.h

I guess I'm going to give up on this one. We'll keep it as a local patch.

-- 
Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx>
     NetBSD: 26 ans d'experience feront toujours la difference
--



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.