TITLE: iostreams, locales, and facets (Newsgroups: comp.lang.c++.moderated, 24 Jun 97) MARIAN: marian (marian@shellx.best.com) >: It occurs to me that I really do not understand iostreams. >: I use them. KUEHL: kuehl@horn.informatik.uni-konstanz.de (Dietmar Kuehl) This is actually no real need for every C++ programmer to [fully] understand how IOStreams work. IOStreams are designed such that simply things can be achieved easily. Easy things are writing and reading standard types and user defined types. More complex things include writing to new external representations and modifying the way built-in types are formatted (where even the latter is made relatively easy in the standard C++ library!). MARIAN: >: I look at a chart and see--oh, yes, this is the hierarchy, etc. >: I understand descriptions when I read about the classes in iostreams. KUEHL: Actually, there is not much to understand about IOStreams! The IOStream functionality is separated into three distinct part which can partially be used individually: - 'streambuf' and the classes derived thereof is used to encapsulate the "external representation". An external representation can be a file or a 'string' (these are the two representation provided by the standard) but there can be others, too. For example you may use a GUI window, a socket, or a compression algorithm as external representation (where the latter actually uses another 'streambuf' to write the result to or read the input from). However, the details of how new external representations are created are somewhat tricky. - 'ios' and classes derived thereof to handle formatted IO. 'ios' is the container for format flags and stores a reference to the external representation. 'istream' and 'ostream' are the primary interface for formatted IO. Classes derived thereof are only for convenient construction and convenient access to the corresponding 'streambuf'. There is not much functionality in these classes anymore: They mainly delegate all requests to the corresponding 'streambuf' and to corresponding 'facet's (see below). - 'locale' and the corresponding 'facet' objects it stores are used to format built-in types or parse strings to form built-in types. There are also 'facet's used to convert between different character representations which are used in several places. Unless the user of the IOStream library wants to change the formatting of built-in types there is hardly any need to know how this works and most things are supposed to happen behind the scenes. Using these components, all IO is done something like this: 1. 'istream' or 'ostream' is the interface for the user's request which delegates anything having to do with formatting to the corresponding 'facet', ie. 'num_put' or 'num_get' and the 'facet's used thereof (eg. 'numpunct'). 2. Formatting is done in a 'locale'-dependent fashion by the corresponding 'facet'(s). Actually, this is not really very complex since most of the individual operations are rather simple. What may get somewhat complex are the many interactions between the various 'facet's and other components of the IOStream library. However, this is actually more part of the locales library :-) Seriously, most of the locales services are directly related to the needs of the IOStream library (message management is the only thing coming to my mind which is not needed by the IOStream library). 3. The result of the locales operations or the input for these operations is read using 'streambuf' methods, normally using 'istreambuf_iterator' or 'ostreambuf_iterator'. If you separate the IOStream library like this, I think you will see that it is quite clear what happens. However, when you look at the details first, you will find yourself in a maze of different calls jumping between the various classes which severely obfuscates the big picture. MARIAN: >: But I don't understand iostreams in a historical development sense. KUEHL: I can't say much about the roots of the IOStream library. I think there was an initial implementation which was than redesigned by Jerry Schwartz to make it more flexible. I can't really say how the library looked prior to this redesign. However, the main design remained unchanged since then although some major details changed: For example, the templatization and the introduction of locales is relatively new. MARIAN: >: They have changed over time. How? In what way? Why? >: They are templatized. Simply for efficiency? What are the >: ramifications of this? KUEHL: The main goal of the IOStream library was ease of use, followed by simple extensability. Efficiency was "only" a secondary goal. Since I can't say much about the original IOStream library, I will only address locales and templatization. C++ is intended to be usable in all cultures with reasonable support for cultural differences. For example, I would like to see that my name can be processed correctly (the 'ue' in my name is actually an "u-umlaut", ie. a 'u' with two funny points above it; I'm not going to try to describe how this is pronounced...). This is a relatively simple thing and only involves recognition of additional characters as letters. To make a Japanese name appear correctly, a different character set or character encoding as to be used. This is addressed partially by the templatization and partially by locales. With IOStreams being templates, you can use a nearly arbitrary type to represent individual characters (although there is not much guaranteed support if you don't use 'char' or 'wchar_t'...). Also, you can easily change the interpretation of characters. All but some base classes in the IOStream library are templates of two types, the character type and traits type depending on the character type. The character type defines the representation used for the individual characters within the IOStream library. The traits types describes some fundamental traits of the corresponding character type, eg. the object used as EOF symbol, which characters are interpreted as spaces or letters, etc. There is not much to the templatization except that you can now use different character representations and different interpretations of individual characters. There are more cultural differences, though. For example, in some buisnesses floating point numbers are written with thousand separator, like 1,000,000 instead of just 1000000. To make things worse, in some countries (eg. in Germany) this is written differently, namely like this: 1.000.000. All such things are lumped together in locales. Since it is convenient to use locales for this, another thing made it into the locales library: Code conversion. Normally, operating systems are capable to write 'char' objects, but they are not necessary able to write 'wchar_t' or some other objects. Thus, conversions between different representations, eg. between 'wchar_t' and some multi byte representation, are necessary. Thus, templatization and locales are introduces mainly to cope with cultural differences. While the performance impact of templatization should be none or minimal, I think using locales makes thing somewhat less efficient not more efficient! There are many virtual functions called and facets looked up in some tables, where formerly a simple call was made. However, both changes made the library more flexible while maintaining easy use (but not necessary easy implementation!). I guess I have not sufficiently addressed what you are actually interested in: It would probably take a whole book to make sure everything is covered appropriately. However, I hope that I have provided enough information to pick on: I think I can answer most specific questions about the IOStream library. Maybe someone familiar with the history of the IOStream library can provide some details about the original implementation ... (I would be interested, too).