[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Fix error: array subscript has type 'char'



On 14.01.2021 13:29, Manuel Bouyer wrote:
> On Thu, Jan 14, 2021 at 11:53:20AM +0100, Jan Beulich wrote:
>> On 12.01.2021 19:12, Manuel Bouyer wrote:
>>> From: Manuel Bouyer <bouyer@xxxxxxxxxx>
>>>
>>> Use unsigned char variable, or cast to (unsigned char), for
>>> tolower()/islower() and friends. Fix compiler error
>>> array subscript has type 'char' [-Werror=char-subscripts]
>>
>> Isn't this something that wants changing in your ctype.h instead?
>> the functions (or macros), as per the C standard, ought to accept
>> plain char aiui, and if they use the input as an array subscript,
>> it should be their implementation suitably converting type first.
> 
> I asked for inputs from NetBSD developers familiar with this part.
> 
> Although the parameter is an int, only a subset of values is valid,
> as stated in ISO C 2018 (Section 7.4 paragrah 1):
>> In all cases the argument is an int, the value of which shall be
>> representable as an unsigned char or shall equal the value of the
>> macro EOF.  If the argument has any other value, the behavior is 
>> undefined.                               

Which means you're shifting the undefined-ness from the implementation
(using the value as array index) to the callers (truncating values, or
converting value range). In particular I do not think that the
intention behind the standard's wording is for every caller to need to
cast to unsigned char, when e.g. character literals are of type char
and string literals are of type const char[].

> As stated by NetBSD's ctype(3) manual page, NetBSD and glibc took different
> approach. NetBSD emits a compile-time warning if the input may lead to
> undefined behavior. quoting the man page:
>      Some implementations of libc, such as glibc as of 2018, attempt to avoid
>      the worst of the undefined behavior by defining the functions to work for
>      all integer inputs representable by either unsigned char or char, and
>      suppress the warning.  However, this is not an excuse for avoiding
>      conversion to unsigned char: if EOF coincides with any such value, as it
>      does when it is -1 on platforms with signed char, programs that pass char
>      will still necessarily confuse the classification and mapping of EOF with
>      the classification and mapping of some non-EOF inputs.
> 
> 
> So, although no warning is emmited on linux, it looks like to me that the
> cast to unsigned char is needed anyway, and relying on glibc's behavior
> is not portable.

I fully agree we shouldn't rely on glibc's behavior (I'm sure
you've looked at xen/include/xen/ctype.h to see how we address
this it Xen itself, but I will admit this is to a degree comparing
apples and oranges, not the least because the lack of a need to
consider EOF in Xen). At least in xen/tools/symbols.c I don't
think we're at risk of running into "undefined" space. Casts are
generally desirable to avoid whenever possible, as they come with
their own set of risks. Granted this is "only" user space code,
but still.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.