TITLE: Downcasting and run-time type checking scheme

RESPONSE: David Chase (Sun Microsystems)

(Disclaimer that should appear on most of these articles on
 this subject, independent of author:

 My understanding of type theory is weak.  However, type theory
 has very little to do with C++, except when explaining its
 shortcomings. )

QUESTION: What is downcasting?

"Downcasting" is just "casting" in C, but in C++ there is a notion of
"up" and "down" the inheritance tree.  I think it is also restricted
in use to refer to casts of pointers.  Upcasting is casting from a
derived type to a base type.  Downcasting is casting from a base type
to a derived type.  In C, casting from "void *" to "notvoid *" may be
regarded as a downcast, since a "void *" is a sort of "any pointer"
type.

QUESTION: Is it terminology peculiar to a language like C++?

I've only seen the terminology used in connection with C++, though
other languages have similar concepts.  In other languages (Cedar Mesa,
Modula-2+, Modula-3) the type-safe downcast is called NARROW; the
unsafe cast is called LOOPHOLE.  Type-safety is implicit in languages
like Lisp and ML.

QUESTION: What is type safe downcasting?

"Type safe" downcasting means that if you say

   some_subr(void * x) {  /* OR -- "some_base_type * x" */

      ... (foo *) x ... /* Cast x into a foo_pointer */

   }

then the cast is checked (either at compile time or run time, depends
on lots of things including language design and optimizer) to ensure
that in fact, x is really a pointer to a foo (or a type derived from
foo).  If this it isn't, then the (implementation-defined, I believe
the standard will say) result is an error.

Some people don't call this "type-safe" because a run-time error could
result.  That's not what this is about, and those people don't get to
redefine the term because it stems from work done long ago at Xerox
PARC on Cedar Mesa.  Type-safety means that your program cannot
corrupt the abstract machine presented to you by the compiler, linker,
libraries, etc.  (Yes, this is possible.  Yes, "real" systems have
been built along these lines.)

For C++, true type-safety is probably unobtainable, at least given the
prejudices of its existing fan club.  True type safety requires
checked array bounds and garbage collection.

QUESTION: When would you want to use it?

All the time, by default.  Let the optimizer remove the checks where
it can, and where it cannot, tweak by hand if profiling indicates it
would help.

One handy method for writing generic/polymorphic code is to traffic in
"generic" pointers (or base types), especially if you have to deal
with methods that take a second parameter (beyond the implicit "this"
parameter -- one example is "comparison" or "addition").  For example,
I end up with methods on some objects of the form:

  virtual int generic_hash::compare_elements(void * x, void * y) = 0;

and the derived classes contain something like

  int derived_hash::compare_elements(void * gx, void * gy) {
     element_type * x = (element_type *) gx;
     element_type * y = (element_type *) gy;
     ...
     }

( Why this, and not macros/templates?  What if I choose to put objects
of a given type into *different* hash tables that use different
interpretations of equality and hash?  (That's a real example.)  In
practice, even without checking this is easy to use correctly since
the interface to a hash table is fairly cut-and-dried.  However,
mistakes do occur, and there's plenty to be said for having your
mistakes spotted early instead of after you've trampled all over your
memory. Also, this is truly one piece of code, and not susceptible to
surprises caused by creative overloading -- there is less need to
re-verify or re-tune it for each instantiation. )

As far as implementation of this, it is typically done by gluing some
information to an object type's virtual function table.  (Problem 1:
not all objects in C++ have vtables.)  If there is no multiple
inheritance, then the test to determine if object A is a (derived type
of) type T takes constant time (Problem 2: C++ has multiple
inheritance).

Here is the trick (taken from Zadeck et al, I don't have the exact
reference handy, but it is easy to understand once you've seen it):
with single inheritance, the types form a tree.  Do a pre and post
order numbering of the nodes in the tree.  For example:
(nodes have the form "PRE(label)POST")

 1(void)11_____________________
   |     \           \         \
 2(int)1 3(char)2  4(base1)7    9(base2)10
                      /  \         \     \
                  5(d1)5 8(d2)6  10(e1)8 11(e2)9
                  /   \
             6(f1)3 7(f2)4


Given any two nodes X and Y in the tree, we know that X is an ancestor
of Y if and only if

   PRE(X) <= PRE(Y)
   POST(X) >= POST(Y)

Four memory references worst case, or two if we had a smart (static)
linker that could bind the constants at link time (when all the types
are available).  Adding types on the fly is not too hard if you
renumber the tree carefully (observation due either to Al Demers or
Roger Hoover, I cannot remember which).