TITLE: performance, efficiency, and optimization (Newsgroups: comp.lang.c++.moderated, 4 May 99) LEICHTER: Jerry Leichter >> The advice we keep hearing - "measure, don't guess" - is *wrong* for >> the majority of programs, the majority of programmers, most of the time. >> The *right* advice is: Write for clarity/maintainability/correctness. >> If you have to measure - it *probably* doesn't matter. MEYERS: smeyers@aristeia.com (Scott Meyers) writes: > I think the real advice (which I believe was implicit in Bjarne's post) > is "Don't worry about peformance until you know you have a problem. > When you know you have a performance problem, don't guess at the cause, > use empirical studies to find the cause." In other words, behavior first, > performance second -- if at all. I agree with you that for many > programs and many programmers, there's no need to worry about tuning > performance if you've applied reasonable and straightforward coding > practices in the first place (e.g., avoided unnecessary copying of > data, etc.). STROUSTRUP: Bjarne Stroustrup I think the caveat is extremely important and I rarely - if ever - forget to make it in my writings. "The C++ Programming Language (2nd and 3rd editions)" is explicit about considering efficiency *during the design phase* of a project. My recent paper on "Learning Standard C++ as a New language" devotes over a third of its space to demonstrate how a high-level approach can avoid inefficiencies. I consider it irresponsible to expose novices to an approch that initially relies on inefficient and/or error-prone techniques (and only later - if at all - teach them to overcome those problems). Reasonable efficiency is part of the behavior of a program - and is often be part of the specification of a system. For example, "response time is less than one second" is a fairly conventional requirement. A good design cannot ignore basic efficiency requirements. This implies that designers and programmers must have a reasonable model for platform performance (how must does it cost to use the disk? how much does a remote procedure call cost? what is the basic performance implications of the creation of objects of various forms?). In addition, the designers must have some basic understanding of the use that their program/system makes of the platform. Are the basic algorithms of a reasonable order? (quadratic algorithms for millions of values are ridiculous independently of the speed of low-level detailes such as call-by-value vs call-by-reference for small objects - the isssue that started this thread). Does basic operations involve many inter-process or inter-component calls? (I have seen designs where a single simple operation involved dozens of inter-component calls and the examination of several "environment variables" which were kept on disk - again no programming language performance can make such a design perform fast). MEYERS: > However, I'm currently consulting on a project that has very much > embraced the "behavior first, performance second" philosophy, and > though the program behaves quite nicely, it's so slow as to be unusable. > (Trust me, "unusable" in this case is an objective statement :-}) My next > big undertaking is to try to find a way to improve its performance by > between one and two orders of magnitude. A speedup of 100 would get it > into the ballpark we need for a decent initial release. In this case, > we *know* we have a performance problem, and my first step will be to > head straight for one or more profilers. This task is too important for > guessing. STROUSTRUP: The traditional description such a system is "you need a calendar, not a stopwatch to measure its performance" :-) I'm not arguing that people should focus on the cost of function calls, the cost of various forms of argument passing, and the cost of using pointers rather than subscripting to access a vector. Such factors are largely reasonable in current C++ implementations (though factor-of-two improvements are possible in some cases). Rather, people should have a basic understanding of cost factors so that they can focus on what matters for a given project. If you are refreshing a megapixel display, you have one set of concerns, if you are doing transactions across a LAN you have another set. Only if you have some basic understanding of the factors that affect your costs can you hope to spend your time and effort wisely. Some basic measurements early on in a project can be most valuable in avoiding debacles such as the one Scott described. I suspect that the bottom line is that you can afford a pure "behavior first, performance second if at all" only if you happen to know the basic performance characteristics of your platform and your application. Also, performance measurements never fail to surprise and amaze me - guessing about performance is indeed hazardous.