TITLE: Downcasting and run-time type checking scheme RESPONSE: David Chase (Sun Microsystems) (Disclaimer that should appear on most of these articles on this subject, independent of author: My understanding of type theory is weak. However, type theory has very little to do with C++, except when explaining its shortcomings. ) QUESTION: What is downcasting? "Downcasting" is just "casting" in C, but in C++ there is a notion of "up" and "down" the inheritance tree. I think it is also restricted in use to refer to casts of pointers. Upcasting is casting from a derived type to a base type. Downcasting is casting from a base type to a derived type. In C, casting from "void *" to "notvoid *" may be regarded as a downcast, since a "void *" is a sort of "any pointer" type. QUESTION: Is it terminology peculiar to a language like C++? I've only seen the terminology used in connection with C++, though other languages have similar concepts. In other languages (Cedar Mesa, Modula-2+, Modula-3) the type-safe downcast is called NARROW; the unsafe cast is called LOOPHOLE. Type-safety is implicit in languages like Lisp and ML. QUESTION: What is type safe downcasting? "Type safe" downcasting means that if you say some_subr(void * x) { /* OR -- "some_base_type * x" */ ... (foo *) x ... /* Cast x into a foo_pointer */ } then the cast is checked (either at compile time or run time, depends on lots of things including language design and optimizer) to ensure that in fact, x is really a pointer to a foo (or a type derived from foo). If this it isn't, then the (implementation-defined, I believe the standard will say) result is an error. Some people don't call this "type-safe" because a run-time error could result. That's not what this is about, and those people don't get to redefine the term because it stems from work done long ago at Xerox PARC on Cedar Mesa. Type-safety means that your program cannot corrupt the abstract machine presented to you by the compiler, linker, libraries, etc. (Yes, this is possible. Yes, "real" systems have been built along these lines.) For C++, true type-safety is probably unobtainable, at least given the prejudices of its existing fan club. True type safety requires checked array bounds and garbage collection. QUESTION: When would you want to use it? All the time, by default. Let the optimizer remove the checks where it can, and where it cannot, tweak by hand if profiling indicates it would help. One handy method for writing generic/polymorphic code is to traffic in "generic" pointers (or base types), especially if you have to deal with methods that take a second parameter (beyond the implicit "this" parameter -- one example is "comparison" or "addition"). For example, I end up with methods on some objects of the form: virtual int generic_hash::compare_elements(void * x, void * y) = 0; and the derived classes contain something like int derived_hash::compare_elements(void * gx, void * gy) { element_type * x = (element_type *) gx; element_type * y = (element_type *) gy; ... } ( Why this, and not macros/templates? What if I choose to put objects of a given type into *different* hash tables that use different interpretations of equality and hash? (That's a real example.) In practice, even without checking this is easy to use correctly since the interface to a hash table is fairly cut-and-dried. However, mistakes do occur, and there's plenty to be said for having your mistakes spotted early instead of after you've trampled all over your memory. Also, this is truly one piece of code, and not susceptible to surprises caused by creative overloading -- there is less need to re-verify or re-tune it for each instantiation. ) As far as implementation of this, it is typically done by gluing some information to an object type's virtual function table. (Problem 1: not all objects in C++ have vtables.) If there is no multiple inheritance, then the test to determine if object A is a (derived type of) type T takes constant time (Problem 2: C++ has multiple inheritance). Here is the trick (taken from Zadeck et al, I don't have the exact reference handy, but it is easy to understand once you've seen it): with single inheritance, the types form a tree. Do a pre and post order numbering of the nodes in the tree. For example: (nodes have the form "PRE(label)POST") 1(void)11_____________________ | \ \ \ 2(int)1 3(char)2 4(base1)7 9(base2)10 / \ \ \ 5(d1)5 8(d2)6 10(e1)8 11(e2)9 / \ 6(f1)3 7(f2)4 Given any two nodes X and Y in the tree, we know that X is an ancestor of Y if and only if PRE(X) <= PRE(Y) POST(X) >= POST(Y) Four memory references worst case, or two if we had a smart (static) linker that could bind the constants at link time (when all the types are available). Adding types on the fly is not too hard if you renumber the tree carefully (observation due either to Al Demers or Roger Hoover, I cannot remember which).