TITLE: reference counting (part I) (Newsgroups: comp.lang.c++.moderated, 8 Sep 98) AUTHOR: herbs@cntc.com (Herb Sutter) .--------------------------------------------------------------------. | Guru of the Week problems and solutions are posted regularly on | | news:comp.lang.c++.moderated. For past problems and solutions | | see the GotW archive at http://www.cntc.com. | | Is there a topic you'd like to see covered? mailto:herbs@cntc.com | `--------------------------------------------------------------------' _______________________________________________________ GotW #43: Reference Counting - Part I Difficulty: 4 / 10 _______________________________________________________ JG Question ----------- >Consider the following simplified String class: > > namespace Original { > class String { > public: > String(); // start off empty > ~String(); // free the buffer > String( const String& ); // take a full copy > void Append( char ); // append one character > private: > char* buf_; // allocated buffer > size_t len_; // length of buffer > size_t used_; // # chars actually used > }; > } > >This is a simple String that does not contain any fancy >optimizations. When you copy an Original::String, the >new object immediately allocates its own buffer and you >immediately have two completely distinct objects. > >Your assignment: Implement Original::String. Here is a straightforward implementation. namespace Original { String::String() : buf_(0), len_(0), used_(0) { } String::~String() { delete[] buf_; } String::String( const String& other ) : buf_(new char[other.len_]), len_(other.len_), used_(other.used_) { copy( other.buf_, other.buf_ + used_, buf_ ); } I've chosen to implement an additional Reserve helper function for code clarity, since it will be needed by other mutators besides Append. Reserve ensures that our internal buffer is at least n bytes long, and allocates additional space using an exponential-growth strategy: void String::Reserve( size_t n ) { if( len_ < n ) { size_t newlen = max( len_ * 1.5, n ); char* newbuf = new char[ newlen ]; copy( buf_, buf_+used_, newbuf ); delete[] buf_; // now all the real work is buf_ = newbuf; // done, so take ownership len_ = newlen; } } void String::Append( char c ) { Reserve( used_+1 ); buf_[used_++] = c; } } Aside: What's the Best Buffer Growth Strategy? ---------------------------------------------- When a String object runs out of room in its currently- allocated buffer, it needs to allocate a larger one. The key question is: "How big should the new buffer be?" [Note: The analysis that follows holds for other structures and containers that use allocated buffers, such as a standard vector<>.] There are several popular strategies. I'll note each strategy's complexity in terms of the number of allocations required, and the average number of times a given character must be copied, for a string of eventual length N. a) Exact growth. In this strategy, the new buffer is made exactly as large as required by the current operation. For example, appending one character and then appending another will force two allocations; first a new buffer is allocated with space for the existing string and the first new character; next, another new buffer is allocated with space for that and the second new character. - Advantage: No wasted space. - Disadvantage: Poor performance, worse than O(N) allocations, and O(N*N) copies if characters are appended one at a time. b) Fixed-increment growth. The new buffer should be a fixed amount larger than the current buffer. For example, a 64-byte increment size would mean that all string buffers would be an integer multiple of 64 bytes long. - Advantage: Little wasted space. The amount of unused space in the buffer is bounded by the increment size, and does not vary with the length of the string. - Disadvantage: Moderate performance. This strategy requires both O(N) allocations and an average of O(N) char copy operations. That is, both the number of allocations and the average number of times a given char is copied vary linearly with the length of the string. c) Exponential growth. The new buffer is a factor of F larger than the current buffer. For example, with F=.5, appending one character to full string which is already 100 bytes long will allocate a new buffer of length 150 bytes. - Advantage: Good performance. This strategy requires O(logN) allocations and an average of O(k) char copy operations. That is, the number of allocations varies linearly with the length of the string, but the average number of times a given char is copied is constant(!). - Disadvantage: Some wasted space. The amount of unused space in the buffer will always be strictly less than N*F bytes, but that's still O(N) average space wastage. The following chart summarizes the tradeoffs: Growth Strategy Allocations Char Copies Wasted Space --------------- ----------- ----------- ------------ Exact > O(N) up to O(N*N) none Fixed-increment O(N) O(N) O(k) Exponential O(logN) O(k) O(N) In general, the best-performing strategy is exponential growth. Consider a string that starts empty and grows to 1200 characters long. Fixed-increment growth requires O(N) allocations and on average requires copying each character O(N) times (in this case, using 32-byte increments, that's 38 allocations and on average 19 copies of each character). Exponential growth requires O(logN) allocations and on average requires making only O(k) -- one or two -- copies of each character (yes, really; see the reference below); in this case, using a factor of 1.5, that's 10 allocations and on average 2 copies of each character. 1,200-char string 12,000-char string ====================== ======================= Fixed-incr Exponential Fixed-incr Exponential growth growth growth growth (32 bytes) (1.5x) (32 bytes) (1.5x) ---------- ----------- ---------- ----------- # of memory 38 10 380 16 allocations avg # of 19 2 190 2 copies made of each char This result can be surprising. For more information, see Andrew Koenig's column in the September 1998 issue of JOOP (Journal of Object-Oriented Programming). Koenig also shows why, again in general, the best growth factor is not 2 but about 1.5. He also shows why the average number of times a given char is copied is constant -- i.e., doesn't depend on the length of the string. Guru Question ------------- >With a smile on her face and determination in her eyes, >the implementor designs an Optimized::String that uses >a reference-counted implementation (also called "lazy >copy" or "copy on write"): > > namespace Optimized { > struct StringBuf { > StringBuf(); // start off empty > ~StringBuf(); // delete the buffer > void Reserve( size_t n );// ensure len >= n > > char* buf; // allocated buffer > size_t len; // length of buffer > size_t used; // # chars actually used > unsigned refs; // reference count > }; > > class String { > public: > String(); // start off empty > ~String(); // decrement reference count > // (delete buffer if refs==0) > String( const String& ); // point at same buffer and > // increment reference count > void Append( char ); // append one character > private: > StringBuf* data_; > }; > } > >Your assignment: Implement Optimized::StringBuf and >Optimized::String. You may want to add a private >String::AboutToModify() helper function to simplify the >logic. First, consider StringBuf. Note that the default memberwise copying and assignment don't make sense for StringBufs, so both operations should also be suppressed (declared as private and not defined). namespace Optimized { StringBuf::StringBuf() : buf(0), len(0), used(0), refs(1) { } StringBuf::~StringBuf() { delete[] buf; } void StringBuf::Reserve( size_t n ) { if( len < n ) { size_t newlen = max( len * 1.5, n ); char* newbuf = new char[ newlen ]; copy( buf, buf+used, newbuf ); delete[] buf; // now all the real work is buf = newbuf; // done, so take ownership len = newlen; } } Next, consider String itself. The default constructor and the destructor are easy to implement. String::String() : data_(new StringBuf) { } String::~String() { if( --data_->refs < 1 ) { delete data_; } } Next, we write the copy constructor to implement the "lazy copy" semantics by simply updating a reference count. We will only actually split off the shared representation if we discover that we need to modify one of the strings that share this buffer. String::String( const String& other ) : data_(other.data_) { ++data_->refs; } I've chosen to implement an additional AboutToModify helper function for code clarity, since it will be needed by other mutators besides Append. AboutToModify ensures that we have an unshared copy of the internal buffer; that is, it performs the "lazy copy" if it has not already been performed. For convenience, AboutToModify takes a minimum buffer size, so that we won't needlessly take our own copy of a full string only to turn around and immediately perform a second allocation to get more space. void String::AboutToModify( size_t n ) { if( data_->refs > 1 ) { StringBuf* newdata = new StringBuf; newdata->Reserve( max( data_->len, n ) ); copy( data_->buf, data_->buf+data_->used, newdata->buf ); newdata->used = data_->used; --data_->refs; // now all the real work is data_ = newdata; // done, so take ownership } else { data_->Reserve( n ); } } void String::Append( char c ) { AboutToModify( data_->used+1 ); data_->buf[data_->used++] = c; } }