TITLE: preprocessing phases (Newsgroups: comp.std.c++,comp.std.c, ) HOLLINBECK: Rick Hollinbeck >What is the current draft spec re: backslash splicing and comment conversion? >Which comes first? CLAMAGE: stephen.clamage@eng.sun.com (Steve Clamage) The "phases of translation" were carefully worked out by the C Committee to give the expected results in all the usual circumstances. The C++ Committee did not modify them, except for adding a few fairly minor features unique to C++. People try to make the issue of comments too difficult. It's really not. The first three phases go like this for both C and C++, leaving out a few C++ nits: 1. Map physical characters in the external file to internal representation and convert trigraphs to their single-character internal equivalents. (In particular, ??/ gets converted to \ in this phase, meaning you can still escape a newline if you don't have \ in your character set.) 2. Splice physical lines separated by escaped newlines into single logical lines. (This means you can split tokens, including string literals, across physical lines.) 3. Decompose the file into a sequence of preprocessing tokens and whitespace, converting each comment into one blank character. In C, a comment starts with /* and ends with */. In C++, a comment also can start with // and end at the next newline. Since escaped newlines have already been processed, there is no ambiguity. A comment is a comment, and is removed in phase 3, after line splicing. We don't have //-comments currently in C, so the C++ rules don't cause any incompatibility. The C++ rules mean that you probably don't want to use //-comments on a physical line that ends with an escaped newline. There are no other questionable interactions. Escaped newlines, it seems to me, are common only in complicated macro definitions, or in machine-generated code. (Human-written code tends to break lines between, not within, tokens, and has little need for escaped newlines.) One exception would be string literals continued onto several lines, but they can't contain internal comments anyway. Two simple guidelines should eliminate any problems: 1. Use /*...*/ comments instead of //-comments in macro definitions. (Complicated macros are not usually needed in C++ anyway.) 2. If machine-generated code might contain escaped newlines other than in string literals, have the machine generate /*...*/ comments instead of //-comments. The guidelines are stronger than necessary, but are easy to remember and apply, and should not cause any difficulty.