TITLE: Operators, Operators Everywhere (Source: comp.lang.c++.moderated, 19 Feb 2001) AUTHOR: "Herb Sutter" > ------------------------------------------------------------------- > Guru of the Week problems and solutions are posted regularly on > news:comp.lang.c++.moderated. For past problems and solutions > see the GotW archive at www.gotw.ca. (c) 2001 H.P.Sutter > News archives may keep copies of this article. > ------------------------------------------------------------------- > > _______________________________________________________ > > GotW #78: Operators, Operators Everywhere > > Difficulty: 2 / 10 > _______________________________________________________ > > > > Who Is Max Munch, and What's He Doing In My C++ Compiler? > --------------------------------------------------------- > > Max Munch is the only C++ standards committee member to have a perfect > attendance record at all meetings from the very beginning to date. > (If you don't believe me, check the attendance list in each meeting's > minutes.) > > More seriously, the "max munch" rule says that, when interpreting the > characters of source code into tokens, the compiler does it greedily > -- it makes the longest tokens possible. Therefore >> is always > interpreted as a single token, the stream extraction (right-shift) > operator, and never as two individual > tokens even when the > characters appear in a situation like this: > > template> ... > > That's why such code has to be written with an extra space, as: > > template > ... > > Similarly, >>> is always interpreted as >> >, never as > >>, and so > on. > > > Some Fun With Operators > ----------------------- > > >1. What is the greatest number of plus signs (+) that can appear > > consecutively, without whitespace, in a valid C++ program? Of > > course, plus signs in comments don't count. > > (Also of course, plus signs in preprocessor directives including > macros, or in literals, also don't count.) > > It is possible to create a source file containing arbitrarily long > sequences of consecutive + characters, up to a compiler's limits (such > as the maximum source file line length the compiler can handle). > > If the sequence has an even number of + characters, it will be > interpreted as ++ ++ ++ ++ ... ++, a sequence of binary ++ tokens. > To make this work, and have well-defined semantics because of sequence > points, all we need is a class with a user-defined prefix ++ operator > that allows chaining. For example: > > // Example 1(a) > // > class A > { > public: > A& operator++() { return *this; } > }; > > Now we can write code like: > > A a; > ++++++a; // meaning: ++ ++ ++ a; > > which works out to: > > a.operator++().operator++().operator++(); > > What if the sequence has an odd number of + characters? Then it will > be interpreted as ++ ++ ++ ++ ... ++ +, a series of binary ++ tokens > ending with a final unary +. To make this work, we just need to > additionally provide a unary + operator. For example: > > // Example 1(b) > // > class A > { > public: > A& operator+ () { return *this; } > A& operator++() { return *this; } > }; > > Now we can write code like: > > A a; > +++++++a; // meaning: ++ ++ ++ + a; > > which works out to: > > a.operator+().operator++().operator++().operator++(); > > This trick is fairly simple. Creating longer-than-usual sequences of > other characters turns out to be a little more challenging, but still > possible. > > > Abuses of Operators > ------------------- > > The code in Examples 1(a) and 1(b) don't especially abuse the ++ and + > operators' usual semantics. What we're going to do next, however, > really goes beyond anything you'd ever want to see in production code; > this is for fun only. > > >2. Similarly, what is the greatest number of each the following > > characters that can appear consecutively in a valid C++ program? > > For this question, we'll create and use the following helper class: > > // Example 2 > // > class A > { > public: > void operator&&( int ) { } > void operator<<( int ) { } > void operator||( int ) { } > }; > > typedef void (A::*F)(int); > > > > a) & > > Answer: Five. > > Well, && is easy and &&& not too much harder, so let's go right to the > next level: Can we create a series of four &'s, namely &&&&? Well, > if we did, they would be interpreted as "&& &&", but expressions like > "a && && b" are syntactically illegal; we can't have two binary && > operators immediately after each other. The trick is to see that we > can use the second && as an operator, and make the first && come out > as the end of something that's not an operator. With that in mind, it > doesn't take too long to see that the first && could be the end of a > name, specifically the name of a function, and so all we need is an > operator&&() that can accept a pointer to some other operator&&() as > its first parameter: > > void operator&&( F, A ) { } > > This lets us write: > > &A::operator&&&&a; // && && > > which means: > > operator&&(&A::operator&&, a); > > That's the longest even-numbered run of &'s we can make, because > &&&&&& has to be illegal. Why? Because it would mean && && &&, and > even after making the first && part of a name again, we can't make the > final && the beginning of another name, so we're left with two binary > && operators in a row which is illegal. > > But can we squeeze in one more & by going to an odd number of &'s? > Yes, indeed. Specifically, &&&&& means && && &, we already have a > solution for the first part, and with not much thought it's easy to > tack on a unary &: > > &A::operator&&&&&a; // && && & > > which uses the builtin operator&&() that can take pointers: > > operator&&(&A::operator&&, &a); > > > > b) < > > > > c) | > > Answer for both: Four. > > Having seen the solution to 2(a), this one should be easy. To make a > series of four, we just use the same trick as before, defining a: > > void operator<<( F, A ) { } > void operator||( F, A ) { } > > which lets us write: > > &A::operator<<< &A::operator||||a; // || || > > which means: > > operator<<(&A::operator<<, a); > operator||(&A::operator||, a); > > That's the longest even-numbered runs we can make, because <<<<<< and > |||||| have to be illegal, just as we've already noted that &&&&&& has > to be illegal. But this time we can't manage an extra < or | to make > a series of five, because there is no unary < or | operator. > > > Bonus Challenge Question > ------------------------ > > Here's a bonus question: How many question mark (?) characters can > appear in a row in a valid C++ program? > > Think about the question before reading on. It's a lot harder than it > looks. > > * * * * * > > Do you have an answer? > > You might think that the answer must be one, because the only > legitimate token in the C++ language that includes a ? is the ternary > ?: operator. It's true that that's the only legitimate language > feature that includes a ?, but "there are more things in the > translator and preprocessor, Horatio, than are dreamt of in your > language syntax rules..." In particular, there's more to C++ than > just the language, or even just the preprocessor. > > For ?, the correct answer is three. For example: > > 1???-0:0; > > This question is harder than the others in part because it's the only > one that doesn't follow the maximal munch rule. The three question > marks, ???, are not interpreted as the tokens ?? and ?. Why not? > Because ??- happens to be a trigraph, and trigraphs are replaced very > early during source code processing -- before tokenization begins, > even before preprocessor instructions are handled. If you haven't > heard about trigraphs before, don't worry; that just means that you > don't use an exotic foreign-language keyboard. A trigraph is an > alternate way to spell certain unusual source code characters > (specifically #, \, ^, [, ], |, {, }, and ~), provided for the benefit > of programmers whose keyboards don't happen to have a key for that > character. > > In this case, long before any tokenization can occur the trigraph ??- > is replaced with ~, the one's-complement operator. Therefore the > statement above becomes: > > 1?~0:0; > > which is tokenized as: > > 1 ? ~ 0 : 0 ; > > and means: > > 1 ? (~0) : 0 ; > > Trigraphs really are rare in practice. Just to give you an idea of > how rare, note that of the four compilers I tried, only two can handle > the code above: Microsoft Visual Studio.NET (MSVC7) Beta 1 and Comeau > 4.2.44 got it right, whereas Borland 5.5.1 and gcc 2.95.2 didn't > accept it (those compilers may have options to enable trigraphs, but > if so I didn't notice them). _______________________________________________ cpptips mailing list http://cpptips.hyperformix.com