TITLE: using standard containers (Newsgroups: comp.lang.c++.moderated, 6 Feb 99) AUTHOR: Herb Sutter > ------------------------------------------------------------------- > Guru of the Week problems and solutions are posted regularly on > news:comp.lang.c++.moderated. For past problems and solutions > see the GotW archive at www.peerdirect.com. (c) 1998 H.P.Sutter > News archives may keep copies of this article. > ------------------------------------------------------------------- > >_______________________________________________________ > >GotW #50: Using Standard Containers > >Difficulty: 6 / 10 >_______________________________________________________ > > >Congratulations to Bill Wade, the Guru of the Week! > > >>Consider the following code: >> >> void f( vector& v ) { >> char* pc = &v[0]; >> // ... later, uses pc ... >> } >> >>1. Is this code valid? > >Yes. > >Why is this code legal? The reason comes right out of >the standard's Container and Sequence requirements: If >operator[] is provided for a given kind of sequence >(which it is in the case of std::vector), it must >return a "reference". In turn, the "reference" must be >an lvalue of type T (here char), which can have its >address taken. > > >Why Take Pointers or References Into Containers? >------------------------------------------------ > >Many programmers are surprised by this kind of code the >first time they see it, but there's nothing wrong with >this technique in itself. As long as you remain aware >of when the pointers might be invalidated, which is >pretty much whenever iterators would be invalidated, >this technique is not evil, it is not immoral, and it's >not even fattening. On the contrary, it can be quite >useful. > >There are cases where it can be legal (and, more to the >point, where it makes perfect sense) to have pointers >or references into containers. A common case is when >you read a data structure into memory on program >startup and then never modify it, but you need to >access it frequently in different ways. In such cases >it can make sense to have additional data structures >that contain pointers into the main container to >optimize different access methods. I'll give an >example as part of the discussion of Question #2. > > >>2. Whether it's valid or not, how could it be improved? > >In general, it's not a bad guideline to prefer to use >iterators instead of pointers when you want to point at >an object that's inside a container. After all, >iterators are invalidated at the same times in the same >ways as pointers. > >There are two main potential drawbacks to this method. >If either applies in your case, continue to use >pointers. > >1. You can't always conveniently use an iterator where > you can use a pointer. (See example below.) > >2. Using iterators might incur extra space and > performance overhead, in cases where the iterator is > an object and not just a bald pointer. > >Example: Say that you have a map >that, given a name, makes it easy to look up a phone >number. What if you need to do the reverse lookup too? >One solution is to build a second structure, perhaps a >map that enables the >reverse lookup but avoids doubling the storage overhead >(no need to store each name and phone number twice; the >second structure simply has pointers into the first). > >Note that, in the above example, it would be difficult >to use iterators instead of pointers. Why? Because >the natural candidate, map::iterator, >points to a pair, and there's no >handy way to get an iterator to just the name or phone >number part individually. > >This brings us to the next (and in my opinion most >interesting) point of today's lesson: > > >When Is a Container Not a Container? >------------------------------------ > >>3. Consider the following code: >> >> template > > >The above line has two problems: > >a) The easy one is that it doesn't make sense to define > T in terms of itself. That red herring may have > successfully tricked you into missing the more > fundamental problem, namely: > >b) Function templates can't have default arguments. > >Now consider the rest of the function as if it were >introduced with "template" only: > > template >> void f( T& t ) { >> typename T::value_type* p1 = &t[0]; >> typename T::value_type* p2 = &*t.begin(); >> // ... later, uses p1 and p2 ... >> } >> >> Is this code valid? Discuss. > >The short answer is: Sometimes. > >One instructive way to look at this problem is to think >about what kinds of T and t make the code valid. At >compile time, what characteristics and abilities must T >have? At runtime, what characteristics must t have? >Let's do some detective work. > >At compile time: > >a) To make the expression &t[0] valid, T::operator[] > must exist and must return something that > understands operator&. > > In particular, this is true of containers that meet > the standard's Container requirements and implement > the optional operator[], because that operator must > return a reference to the contained object. By > definition, you can then take the contained object's > address. > >b) To make the expression &*t.begin() valid, T::begin() > must exist, and it must return something that > understands operator*, which in turn must return > something that understands operator&. > > In particular, this is true of containers that meet > the standard's iterator requirements, because the > iterator returned by begin() must, when dereferenced > using operator*, return a reference to the contained > object. By definition, you can then take the > contained object's address. > >Further, at runtime: > >c) To make the expression &t[0] safe, the t object > must be in a suitable state for calling t[0]. In > particular, if T is something like std::vector, > then the vector must not be empty. > >d) To make the expression &*t.begin() safe, the t > object must be in a suitable state for calling > t.begin() and dereferencing the result. In > particular, if T is something like std::vector, > then the vector must be nonempty. > > >Which Brings Us to the Embarrassing Part >---------------------------------------- > >The code in Question #3 will work for every container >in the standard library that supports operator[] -- >and, if you take away the line containing "&t[0]", it >will work for every container in the standard library, >bar none -- EXCEPT for std::vector. In fact, >the following template: > > template > void g( vector& v ) { > T* p = &*t.begin(); > // ... do something with p ... > } > >works for every type EXCEPT bool. If this seems a >little strange to you, you're not alone. > > >What About bool? >---------------- > >Unfortunately, there's something a little embarrassing >about the C++ standard library: not all of the >templates it defines that look like containers actually >are Containers. In particular, > > std::vector IS NOT a Container > >because it does not meet the standard library's >requirements for Containers. (If anyone else had >written it, it would have been called "nonconforming" >and "nonstandard." Well, it's in the standard, so that >makes it a little bit harder to call it those names at >this point, but some of us try anyway in the hopes that >it will eventually get cleaned up.) > >The reason std::vector is nonconforming is that >it pulls tricks under the covers in an attempt to >optimize for space: Instead of storing a full char for >every bool (taking up 8 times the space, on platforms >with 8-bit chars), it packs the bools and stores them >as individual bits (inside, say, chars) in its internal >representation. One consequence of this is that it >can't just return a normal bool& from its operator[] or >its dereferenced iterators[*]; instead, it has to play >games with a helper "proxy" class that is bool-like but >is definitely not a bool. Unfortunately, that also >means that access into a vector is slower, >because we have to deal with proxies instead of direct >pointers and references. > >[*] because there is no way to get a pointer or a >reference with sub-byte precision; if you're thinking >"well, just make a bool* capable of that by making it >wider," you should consider the (pretty major) >consequences, including that then all of a sudden not >all data pointers would be the same size! > > >All of the above trickery has the following >disadvantages: > >1. std::vector is not a Container. 'Nuff said. > >2. std::vector attempts to illustrate how to > write proxied containers that pull tricks under the > covers. Unfortunately, that's not a sound idea, > because by definition that violates the Container > requirements. (See #1.) (The Container > requirements were never changed to enable proxied > containers to actually be conforming, and as far as > I know that decision was deliberate.) > >3. std::vector's name is misleading because the > things inside aren't even standard bools. A > standard bool is at least as big as a char, so that > it can be used "normally." So, in fact, > std::vector does not even store bools, despite > the name. > >4. std::vector forces a specific optimization on > all users by enshrining it in the standard. That's > not a good idea; different users have different > requirements, and now all users of vector must > pay the performance penalty even if they don't want > or need the space savings. > >Bottom line: If you care more about speed than you do >about size, you shouldn't use std::vector. >Instead, you should hack around this optimization by >using a std::vector or the like instead, which is >unfortunate but still the best you can do. ----------------------------------------------------------- SUTTER: hsutter@peerdirect.com >>In general, it's not a bad guideline to prefer to use >>iterators instead of pointers when you want to point at >>an object that's inside a container. After all, >>iterators are invalidated at the same times in the same >>ways as pointers. ALDRIDGE: John Aldridge >Not always. Insertion at the beginning or end of a deque invalidates >iterators but not pointers to members of the deque (23.2.1.3(1)). __________________________________________________________ See the C++ Tip-of-the-day home page for more information. http://www.ses.com/~clarke/TableOfContents.html