TITLE: iostreams, locales, and facets

(Newsgroups: comp.lang.c++.moderated, 24 Jun 97) 


MARIAN: marian (marian@shellx.best.com)

>: It occurs to me that I really do not understand iostreams.
>: I use them.

KUEHL: kuehl@horn.informatik.uni-konstanz.de (Dietmar Kuehl)

This is actually no real need for every C++ programmer to [fully]
understand how IOStreams work. IOStreams are designed such that simply
things can be achieved easily. Easy things are writing and reading
standard types and user defined types. More complex things include
writing to new external representations and modifying the way built-in
types are formatted (where even the latter is made relatively easy in
the standard C++ library!).

MARIAN:

>: I look at a chart and see--oh, yes, this is the hierarchy, etc.
>: I understand descriptions when I read about the classes in iostreams.

KUEHL:

Actually, there is not much to understand about IOStreams! The IOStream
functionality is separated into three distinct part which can partially
be used individually:

- 'streambuf' and the classes derived thereof is used to encapsulate
  the "external representation". An external representation can be a
  file or a 'string' (these are the two representation provided by the
  standard) but there can be others, too. For example you may use a GUI
  window, a socket, or a compression algorithm as external
  representation (where the latter actually uses another 'streambuf' to
  write the result to or read the input from). However, the details of
  how new external representations are created are somewhat tricky.

- 'ios' and classes derived thereof to handle formatted IO. 'ios' is
  the container for format flags and stores a reference to the external
  representation. 'istream' and 'ostream' are the primary interface for
  formatted IO. Classes derived thereof are only for convenient
  construction and convenient access to the corresponding 'streambuf'.
  There is not much functionality in these classes anymore: They mainly
  delegate all requests to the corresponding 'streambuf' and to
  corresponding 'facet's (see below).

- 'locale' and the corresponding 'facet' objects it stores are used to
  format built-in types or parse strings to form built-in types. There
  are also 'facet's used to convert between different character
  representations which are used in several places. Unless the user of
  the IOStream library wants to change the formatting of built-in types
  there is hardly any need to know how this works and most things are
  supposed to happen behind the scenes.

Using these components, all IO is done something like this:

1. 'istream' or 'ostream' is the interface for the user's request which
   delegates anything having to do with formatting to the corresponding
   'facet', ie. 'num_put' or 'num_get' and the 'facet's used thereof
   (eg.  'numpunct').

2. Formatting is done in a 'locale'-dependent fashion by the
   corresponding 'facet'(s). Actually, this is not really very complex
   since most of the individual operations are rather simple. What may
   get somewhat complex are the many interactions between the various
   'facet's and other components of the IOStream library. However, this
   is actually more part of the locales library :-) Seriously, most of
   the locales services are directly related to the needs of the
   IOStream library (message management is the only thing coming to my
   mind which is not needed by the IOStream library).

3. The result of the locales operations or the input for these
   operations is read using 'streambuf' methods, normally using
   'istreambuf_iterator' or 'ostreambuf_iterator'.

If you separate the IOStream library like this, I think you will see
that it is quite clear what happens. However, when you look at the
details first, you will find yourself in a maze of different calls
jumping between the various classes which severely obfuscates the big
picture.

MARIAN:

>: But I don't understand iostreams in a historical development sense.

KUEHL:

I can't say much about the roots of the IOStream library. I think there
was an initial implementation which was than redesigned by Jerry
Schwartz to make it more flexible. I can't really say how the library
looked prior to this redesign. However, the main design remained
unchanged since then although some major details changed: For example,
the templatization and the introduction of locales is relatively new.

MARIAN:

>: They have changed over time.  How?  In what way? Why?
>: They are templatized.  Simply for efficiency? What are the
>: ramifications of this?

KUEHL:

The main goal of the IOStream library was ease of use, followed by
simple extensability. Efficiency was "only" a secondary goal. Since I
can't say much about the original IOStream library, I will only address
locales and templatization.

C++ is intended to be usable in all cultures with reasonable support
for cultural differences.  For example, I would like to see that my
name can be processed correctly (the 'ue' in my name is actually an
"u-umlaut", ie. a 'u' with two funny points above it; I'm not going to
try to describe how this is pronounced...). This is a relatively simple
thing and only involves recognition of additional characters as
letters. To make a Japanese name appear correctly, a different
character set or character encoding as to be used. This is addressed
partially by the templatization and partially by locales.

With IOStreams being templates, you can use a nearly arbitrary type to
represent individual characters (although there is not much guaranteed
support if you don't use 'char' or 'wchar_t'...). Also, you can easily
change the interpretation of characters. All but some base classes in
the IOStream library are templates of two types, the character type and
traits type depending on the character type. The character type defines
the representation used for the individual characters within the
IOStream library. The traits types describes some fundamental traits of
the corresponding character type, eg. the object used as EOF symbol,
which characters are interpreted as spaces or letters, etc. There is
not much to the templatization except that you can now use different
character representations and different interpretations of individual
characters.

There are more cultural differences, though. For example, in some
buisnesses floating point numbers are written with thousand separator,
like 1,000,000 instead of just 1000000. To make things worse, in some
countries (eg. in Germany) this is written differently, namely like
this: 1.000.000. All such things are lumped together in locales. Since
it is convenient to use locales for this, another thing made it into
the locales library: Code conversion. Normally, operating systems are
capable to write 'char' objects, but they are not necessary able to
write 'wchar_t' or some other objects. Thus, conversions between
different representations, eg. between 'wchar_t' and some multi byte
representation, are necessary.

Thus, templatization and locales are introduces mainly to cope with
cultural differences. While the performance impact of templatization
should be none or minimal, I think using locales makes thing somewhat
less efficient not more efficient! There are many virtual functions
called and facets looked up in some tables, where formerly a simple
call was made.

However, both changes made the library more flexible while maintaining
easy use (but not necessary easy implementation!).

I guess I have not sufficiently addressed what you are actually
interested in: It would probably take a whole book to make sure
everything is covered appropriately. However, I hope that I have
provided enough information to pick on: I think I can answer most
specific questions about the IOStream library. Maybe someone familiar
with the history of the IOStream library can provide some details about
the original implementation ... (I would be interested, too).