TITLE: practical concerns of signed versus unsigned characters 

(Private correspondence, 23 Aug 99)


CLARKE: "Allan D. Clarke" <clarke@ses.com>

> TITLE: signed versus unsigned characters
> 
> (Newsgroup: comp.lang.c++.moderated, 12 Aug 99)
> 
> CLAMAGE: Steve Clamage <clamage@eng.sun.com>
> 
> If you want to store characters (as opposed to small positive
> integers), use "plain" char. That type will have the correct
> behavior for all character-oriented classes and functions,
> whereas type unsigned char (or signed char) will not always work.


TRIBBLE: david@tribble.com

As I pointed out to the ISO C9X committee, plain char and signed char
types cause problems with portable code.  This is because of the nature
of sign-extension for character values with their high bits set.

    char  ch = 0xA0;   // ISO-8851-1 nonbreaking space

    if (ch == 0xA0)    // Fails if plain char is signed
        ...
    if (ch == '\xA0')  // Fails if plain char is signed
        ...
    if (isupper(ch))   // Bug when plain char is signed
        ...

Personally, I've encountered far fewer of these kinds of problems by
using 'unsigned char', which does not suffer from sign-extension
surprises.  The drawback is having to cast all my 'unsigned char'
buffer/string types to plain 'char*' when passing them to any of the
standard library functions (e.g., strcmp()).

In my not-so-humble opinion, 'char' should have been an unsigned datatype
from the very beginning.  But I recognize the roots of C/C++, i.e., the
PDP-11, which apparently encouraged the use of signed characters.  I also
recognize the fact that far too much code would break if the signedness
of 'char' were changed.
(See http://www.flash.net/~dtribble/text/cbug001.txt.)