TITLE: performance, efficiency, and optimization

(Newsgroups: comp.lang.c++.moderated, 4 May 99)


LEICHTER: Jerry Leichter

>> The advice we keep hearing - "measure, don't guess" - is *wrong* for
>> the majority of programs, the majority of programmers, most of the time.
>> The *right* advice is:  Write for clarity/maintainability/correctness.
>> If you have to measure - it *probably* doesn't matter.


MEYERS: smeyers@aristeia.com (Scott Meyers) writes:

> I think the real advice (which I believe was implicit in Bjarne's post)
> is "Don't worry about peformance until you know you have a problem.  
> When you know you have a performance problem, don't guess at the cause, 
> use empirical studies to find the cause."  In other words, behavior first,
> performance second -- if at all.  I agree with you that for many
> programs and many programmers, there's no need to worry about tuning 
> performance if you've applied reasonable and straightforward coding 
> practices in the first place (e.g., avoided unnecessary copying of 
> data, etc.).


STROUSTRUP: Bjarne Stroustrup <bs@research.att.com>

I think the caveat is extremely important and I rarely - if ever - forget
to make it in my writings. "The C++ Programming Language (2nd and 3rd
editions)" is explicit about considering efficiency *during the design 
phase* of a project.

My recent paper on "Learning Standard C++ as a New language" devotes over
a third of its space to demonstrate how a high-level approach can avoid
inefficiencies. I consider it irresponsible to expose novices to an approch
that initially relies on inefficient and/or error-prone techniques (and
only later - if at all - teach them to overcome those problems).

Reasonable efficiency is part of the behavior of a program - and is often
be part of the specification of a system. For example, "response time is
less than one second" is a fairly conventional requirement. A good design
cannot ignore basic efficiency requirements.

This implies that designers and programmers must have a reasonable model
for platform performance (how must does it cost to use the disk? how much
does a remote procedure call cost? what is the basic performance
implications of the creation of objects of various forms?).

In addition, the designers must have some basic understanding of the use
that their program/system makes of the platform. Are the basic algorithms
of a reasonable order? (quadratic algorithms for millions of values are
ridiculous independently of the speed of low-level detailes such as
call-by-value vs call-by-reference for small objects - the isssue that
started this thread). Does basic operations involve many inter-process or
inter-component calls? (I have seen designs where a single simple operation
involved dozens of inter-component calls and the examination of several
"environment variables" which were kept on disk - again no programming
language performance can make such a design perform fast).


MEYERS:

> However, I'm currently consulting on a project that has very much
> embraced the "behavior first, performance second" philosophy, and 
> though the program behaves quite nicely, it's so slow as to be unusable.  
> (Trust me, "unusable" in this case is an objective statement :-}) My next 
> big undertaking is to try to find a way to improve its performance by
> between one and two orders of magnitude.  A speedup of 100 would get it 
> into the ballpark we need for a decent initial release.  In this case, 
> we *know* we have a performance problem, and my first step will be to 
> head straight for one or more profilers.  This task is too important for 
> guessing.


STROUSTRUP:

The traditional description such a system is "you need a calendar, not a
stopwatch to measure its performance" :-)

I'm not arguing that people should focus on the cost of function calls, the
cost of various forms of argument passing, and the cost of using pointers
rather than subscripting to access a vector. Such factors are largely
reasonable in current C++ implementations (though factor-of-two
improvements are possible in some cases).

Rather, people should have a basic understanding of cost factors so that
they can focus on what matters for a given project. If you are refreshing 
a megapixel display, you have one set of concerns, if you are doing
transactions across a LAN you have another set. Only if you have some 
basic understanding of the factors that affect your costs can you hope to 
spend your time and effort wisely. Some basic measurements early on in a 
project can be most valuable in avoiding debacles such as the one Scott 
described.

I suspect that the bottom line is that you can afford a pure "behavior
first, performance second if at all" only if you happen to know the basic
performance characteristics of your platform and your application. Also, 
performance measurements never fail to surprise and amaze me - guessing 
about performance is indeed hazardous.