TITLE: common C++ optimizations DM = Daniel K M (danielkm@aol.com) JK = kanze@gabi-soft.fr (J. Kanze), 16 Dec 95 DM: |> {The question about inline functions is covered in FAQ #16 |> (the FAQ is online at ). |> Please restrict replies to non-FAQ topics. -mod} |> Can anyone give a few pointers on some common C++ optimization techniques? JK: Not specifically C++, but the most effective optimization technique I know of (after that of choosing the correct algorithm) is to profile the output, and attact the hot spots. C++ is particularly good for this. The improved encapsulation should mean that the local changes to improve speed do not propagate all over the place. DM: |> For instance, I've recently read some material that seems to imply that |> inline functions may sometimes perform better then functions declared |> separately from the class definition. Is this true? JK: Inlining a small function may result in better performance in some cases. Except for the obvious case of forwarding functions, I wouldn't worry about making functions inline until I had the profiler output; it increases (compiler) coupling, which can noticeably increase your compile times. Inlining large functions may even have a negative impact on performance; it results in a larger task image, which in turn may cause more paging. DM: |> If so, when is an |> inline function better? JK: When the profiler output says it's necessary, and the results of inlining are measurable. DM: |> And what similiar optimization tips might you |> advise? Naturally, I'm looking for tips that should apply to most C++ |> compilers, and not to any specific one. JK: The two largest time spenders, in my experience: 1. Dynamic memory management. If you have a small class, and there will be a lot of dynamic instances of the class, consider overloading operator new and operator delete in the class. (A fixed size allocator can easily perform two magnitudes faster than the general allocator needed for the global new and delete.) When the profile starts showing over 90% of the CPU time in malloc and free, this is the solution. Watch out for ``hidden'' allocations. Many array classes, for example, allocate memory on the heap. Consider the following implementation of a Point: class Point { public : Point() : coords( 3 ) {} // ... private : vector< double > coords ; } ; Now define a Line and a Plane using vector< Point > (with 2 and 3 elements, respectively). Now start using arrays (vectors) of Lines and Planes. Now count the allocations. (One per vector). There are two negative effects on performance here: the time necessary to allocate the memory, and the time lost because of paging due to poor locality. If profiling shows this to be a problem, replace `vector' with a custom fixed length array class. (With all due respects to my friend Dietmar Kuehl, in this particular case, and in general, whenever there are small, fixed length arrays of basic types fully encapsulated in a primitive class, I'd just use C style arrays.) 2. Unnecessary copying. This usually occurs with return values. A simple example: I had an application in which various functions returned a SetOf objects that the generated. In some cases, there were 20 or 30 thousand objects in the set. There was a visual pause in the output of the program (about 5 seconds for smaller sets of a couple of 1000 elements) whenever one of these functions returned. I modified the functions to take a non-const reference parameter for the output value, instead of the return value, and the pause completely disappeared, e.g.: // Originally: SetOf< MyClass > f( /* ... */ ) ; // Changed to: void f( SetOf< MyClass >& results , /* ... */ ) ; When calling the function, the user must provide an empty set. Note that using operator+= instead of operator+ (suggested by another poster, and Scott Meyers) is actually a variant of this. Again, the idea is to avoid the copy through the temporary return value. Finally: one or two other posters suggested avoiding virtual functions, or multiple inheritance. In my experience, these are both tools which, correctly used, increase encapsulation. As such, the *increase* the ease in making local optimizations, and so globally, have a positive impact on performance, rather than a negative one.