TITLE: internationalization concerns with uppercasing strings (Newsgroups: comp.lang.c++.moderated, 6 Oct 99) ROSSETTI: mikro@xmission.com (Mike Rossetti) > I've been looking for a function/algorithm in the Standard Library > which would either 1) convert a string to uppercase, or 2) do a case > insensitive compare. > It doesn't look convenient to have to write a locale/facet for this > because I only need to do case conversions or case-insensitive > compares occasionally but otherwise work with the string just like any > other string, including doing case sensitive compares. KANZE: James.Kanze@dresdner-bank.com I don't see how you can expect to avoid locale's, since the definition of case-insensitive (or case, for that matter) is very locale dependant. Also, converting the string to upper case and then doing a comparison won't work either: in both French and German, the conversion will lose information. At least in the langauges I'm familiar with, converting to lower case is a better choice, although there may be problems here as well. At any rate, you don't have to do write your own locale facet, since there is one for this already in the standard. There is also a convenience function of locale, operator(), which can be used where ever less is required in the standard library. The toupper functions are in the ctype facet. For some strange reason, the ones for strings are templated to takes a charT const*, and not simply an iterator, so they cannot be used with std::string. They are also specified to do the conversion in place, which means that they aren't implementable for some significant locales (like Germany, where toupper( "ß" ) should return "SS"). I suspect that the lack of support for iterators is just oversight; the fact that the conversion is "in place" cannot really be considered anything but an error, since it renders the functions totally unusable. On the other hand, given the loss of information in upper case, there is a real question as to what tolower should return: in a German locale, tolower( "SS" ) could be either "ss" or "ß", depending on context, and in a French locale, tolower( "E" ) could be any of "e", "é", "è", "ê" or "ë". (Formally, this is incorrect, as upper case French *should* maintain the accents. Because a normal typewriter lacks the upper case accents, however, most French have gotten into the habit of not using them with upper case.) Globally, I'd suggest avoiding case conversions if at all possible. (For case-insensitive comparison, generally speaking, locale::operator() should be OK. Most of the time, anyway.) [snip]