204 lines
9.8 KiB
HTML
204 lines
9.8 KiB
HTML
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||
|
<html>
|
||
|
<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
|
||
|
<head>
|
||
|
<title>The GNU C Preprocessor Internals: Token Spacing</title>
|
||
|
|
||
|
<meta name="description" content="The GNU C Preprocessor Internals: Token Spacing">
|
||
|
<meta name="keywords" content="The GNU C Preprocessor Internals: Token Spacing">
|
||
|
<meta name="resource-type" content="document">
|
||
|
<meta name="distribution" content="global">
|
||
|
<meta name="Generator" content="makeinfo">
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||
|
<link href="index.html#Top" rel="start" title="Top">
|
||
|
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
|
||
|
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||
|
<link href="index.html#Top" rel="up" title="Top">
|
||
|
<link href="Line-Numbering.html#Line-Numbering" rel="next" title="Line Numbering">
|
||
|
<link href="Macro-Expansion.html#Macro-Expansion" rel="prev" title="Macro Expansion">
|
||
|
<style type="text/css">
|
||
|
<!--
|
||
|
a.summary-letter {text-decoration: none}
|
||
|
blockquote.smallquotation {font-size: smaller}
|
||
|
div.display {margin-left: 3.2em}
|
||
|
div.example {margin-left: 3.2em}
|
||
|
div.indentedblock {margin-left: 3.2em}
|
||
|
div.lisp {margin-left: 3.2em}
|
||
|
div.smalldisplay {margin-left: 3.2em}
|
||
|
div.smallexample {margin-left: 3.2em}
|
||
|
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
|
||
|
div.smalllisp {margin-left: 3.2em}
|
||
|
kbd {font-style:oblique}
|
||
|
pre.display {font-family: inherit}
|
||
|
pre.format {font-family: inherit}
|
||
|
pre.menu-comment {font-family: serif}
|
||
|
pre.menu-preformatted {font-family: serif}
|
||
|
pre.smalldisplay {font-family: inherit; font-size: smaller}
|
||
|
pre.smallexample {font-size: smaller}
|
||
|
pre.smallformat {font-family: inherit; font-size: smaller}
|
||
|
pre.smalllisp {font-size: smaller}
|
||
|
span.nocodebreak {white-space:nowrap}
|
||
|
span.nolinebreak {white-space:nowrap}
|
||
|
span.roman {font-family:serif; font-weight:normal}
|
||
|
span.sansserif {font-family:sans-serif; font-weight:normal}
|
||
|
ul.no-bullet {list-style: none}
|
||
|
-->
|
||
|
</style>
|
||
|
|
||
|
|
||
|
</head>
|
||
|
|
||
|
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
|
||
|
<a name="Token-Spacing"></a>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<a name="Token-Spacing-1"></a>
|
||
|
<h2 class="unnumbered">Token Spacing</h2>
|
||
|
<a name="index-paste-avoidance"></a>
|
||
|
<a name="index-spacing"></a>
|
||
|
<a name="index-token-spacing"></a>
|
||
|
|
||
|
<p>First, consider an issue that only concerns the stand-alone
|
||
|
preprocessor: there needs to be a guarantee that re-reading its preprocessed
|
||
|
output results in an identical token stream. Without taking special
|
||
|
measures, this might not be the case because of macro substitution.
|
||
|
For example:
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">#define PLUS +
|
||
|
#define EMPTY
|
||
|
#define f(x) =x=
|
||
|
+PLUS -EMPTY- PLUS+ f(=)
|
||
|
→ + + - - + + = = =
|
||
|
<em>not</em>
|
||
|
→ ++ -- ++ ===
|
||
|
</pre></div>
|
||
|
|
||
|
<p>One solution would be to simply insert a space between all adjacent
|
||
|
tokens. However, we would like to keep space insertion to a minimum,
|
||
|
both for aesthetic reasons and because it causes problems for people who
|
||
|
still try to abuse the preprocessor for things like Fortran source and
|
||
|
Makefiles.
|
||
|
</p>
|
||
|
<p>For now, just notice that when tokens are added (or removed, as shown by
|
||
|
the <code>EMPTY</code> example) from the original lexed token stream, we need
|
||
|
to check for accidental token pasting. We call this <em>paste
|
||
|
avoidance</em>. Token addition and removal can only occur because of macro
|
||
|
expansion, but accidental pasting can occur in many places: both before
|
||
|
and after each macro replacement, each argument replacement, and
|
||
|
additionally each token created by the ‘<samp>#</samp>’ and ‘<samp>##</samp>’ operators.
|
||
|
</p>
|
||
|
<p>Look at how the preprocessor gets whitespace output correct
|
||
|
normally. The <code>cpp_token</code> structure contains a flags byte, and one
|
||
|
of those flags is <code>PREV_WHITE</code>. This is flagged by the lexer, and
|
||
|
indicates that the token was preceded by whitespace of some form other
|
||
|
than a new line. The stand-alone preprocessor can use this flag to
|
||
|
decide whether to insert a space between tokens in the output.
|
||
|
</p>
|
||
|
<p>Now consider the result of the following macro expansion:
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">#define add(x, y, z) x + y +z;
|
||
|
sum = add (1,2, 3);
|
||
|
→ sum = 1 + 2 +3;
|
||
|
</pre></div>
|
||
|
|
||
|
<p>The interesting thing here is that the tokens ‘<samp>1</samp>’ and ‘<samp>2</samp>’ are
|
||
|
output with a preceding space, and ‘<samp>3</samp>’ is output without a
|
||
|
preceding space, but when lexed none of these tokens had that property.
|
||
|
Careful consideration reveals that ‘<samp>1</samp>’ gets its preceding
|
||
|
whitespace from the space preceding ‘<samp>add</samp>’ in the macro invocation,
|
||
|
<em>not</em> replacement list. ‘<samp>2</samp>’ gets its whitespace from the
|
||
|
space preceding the parameter ‘<samp>y</samp>’ in the macro replacement list,
|
||
|
and ‘<samp>3</samp>’ has no preceding space because parameter ‘<samp>z</samp>’ has none
|
||
|
in the replacement list.
|
||
|
</p>
|
||
|
<p>Once lexed, tokens are effectively fixed and cannot be altered, since
|
||
|
pointers to them might be held in many places, in particular by
|
||
|
in-progress macro expansions. So instead of modifying the two tokens
|
||
|
above, the preprocessor inserts a special token, which I call a
|
||
|
<em>padding token</em>, into the token stream to indicate that spacing of
|
||
|
the subsequent token is special. The preprocessor inserts padding
|
||
|
tokens in front of every macro expansion and expanded macro argument.
|
||
|
These point to a <em>source token</em> from which the subsequent real token
|
||
|
should inherit its spacing. In the above example, the source tokens are
|
||
|
‘<samp>add</samp>’ in the macro invocation, and ‘<samp>y</samp>’ and ‘<samp>z</samp>’ in the
|
||
|
macro replacement list, respectively.
|
||
|
</p>
|
||
|
<p>It is quite easy to get multiple padding tokens in a row, for example if
|
||
|
a macro’s first replacement token expands straight into another macro.
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">#define foo bar
|
||
|
#define bar baz
|
||
|
[foo]
|
||
|
→ [baz]
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Here, two padding tokens are generated with sources the ‘<samp>foo</samp>’ token
|
||
|
between the brackets, and the ‘<samp>bar</samp>’ token from foo’s replacement
|
||
|
list, respectively. Clearly the first padding token is the one to
|
||
|
use, so the output code should contain a rule that the first
|
||
|
padding token in a sequence is the one that matters.
|
||
|
</p>
|
||
|
<p>But what if a macro expansion is left? Adjusting the above
|
||
|
example slightly:
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">#define foo bar
|
||
|
#define bar EMPTY baz
|
||
|
#define EMPTY
|
||
|
[foo] EMPTY;
|
||
|
→ [ baz] ;
|
||
|
</pre></div>
|
||
|
|
||
|
<p>As shown, now there should be a space before ‘<samp>baz</samp>’ and the
|
||
|
semicolon in the output.
|
||
|
</p>
|
||
|
<p>The rules we decided above fail for ‘<samp>baz</samp>’: we generate three
|
||
|
padding tokens, one per macro invocation, before the token ‘<samp>baz</samp>’.
|
||
|
We would then have it take its spacing from the first of these, which
|
||
|
carries source token ‘<samp>foo</samp>’ with no leading space.
|
||
|
</p>
|
||
|
<p>It is vital that cpplib get spacing correct in these examples since any
|
||
|
of these macro expansions could be stringified, where spacing matters.
|
||
|
</p>
|
||
|
<p>So, this demonstrates that not just entering macro and argument
|
||
|
expansions, but leaving them requires special handling too. I made
|
||
|
cpplib insert a padding token with a <code>NULL</code> source token when
|
||
|
leaving macro expansions, as well as after each replaced argument in a
|
||
|
macro’s replacement list. It also inserts appropriate padding tokens on
|
||
|
either side of tokens created by the ‘<samp>#</samp>’ and ‘<samp>##</samp>’ operators.
|
||
|
I expanded the rule so that, if we see a padding token with a
|
||
|
<code>NULL</code> source token, <em>and</em> that source token has no leading
|
||
|
space, then we behave as if we have seen no padding tokens at all. A
|
||
|
quick check shows this rule will then get the above example correct as
|
||
|
well.
|
||
|
</p>
|
||
|
<p>Now a relationship with paste avoidance is apparent: we have to be
|
||
|
careful about paste avoidance in exactly the same locations we have
|
||
|
padding tokens in order to get white space correct. This makes
|
||
|
implementation of paste avoidance easy: wherever the stand-alone
|
||
|
preprocessor is fixing up spacing because of padding tokens, and it
|
||
|
turns out that no space is needed, it has to take the extra step to
|
||
|
check that a space is not needed after all to avoid an accidental paste.
|
||
|
The function <code>cpp_avoid_paste</code> advises whether a space is required
|
||
|
between two consecutive tokens. To avoid excessive spacing, it tries
|
||
|
hard to only require a space if one is likely to be necessary, but for
|
||
|
reasons of efficiency it is slightly conservative and might recommend a
|
||
|
space where one is not strictly needed.
|
||
|
</p>
|
||
|
<hr>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
|
||
|
</body>
|
||
|
</html>
|