141 lines
6.2 KiB
HTML
141 lines
6.2 KiB
HTML
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||
|
<html>
|
||
|
<!-- Copyright (C) 1987-2016 Free Software Foundation, Inc.
|
||
|
|
||
|
Permission is granted to copy, distribute and/or modify this document
|
||
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
||
|
any later version published by the Free Software Foundation. A copy of
|
||
|
the license is included in the
|
||
|
section entitled "GNU Free Documentation License".
|
||
|
|
||
|
This manual contains no Invariant Sections. The Front-Cover Texts are
|
||
|
(a) (see below), and the Back-Cover Texts are (b) (see below).
|
||
|
|
||
|
(a) The FSF's Front-Cover Text is:
|
||
|
|
||
|
A GNU Manual
|
||
|
|
||
|
(b) The FSF's Back-Cover Text is:
|
||
|
|
||
|
You have freedom to copy and modify this GNU Manual, like GNU
|
||
|
software. Copies published by the Free Software Foundation raise
|
||
|
funds for GNU development. -->
|
||
|
<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
|
||
|
<head>
|
||
|
<title>The C Preprocessor: Character sets</title>
|
||
|
|
||
|
<meta name="description" content="The C Preprocessor: Character sets">
|
||
|
<meta name="keywords" content="The C Preprocessor: Character sets">
|
||
|
<meta name="resource-type" content="document">
|
||
|
<meta name="distribution" content="global">
|
||
|
<meta name="Generator" content="makeinfo">
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||
|
<link href="index.html#Top" rel="start" title="Top">
|
||
|
<link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives">
|
||
|
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||
|
<link href="Overview.html#Overview" rel="up" title="Overview">
|
||
|
<link href="Initial-processing.html#Initial-processing" rel="next" title="Initial processing">
|
||
|
<link href="Overview.html#Overview" rel="prev" title="Overview">
|
||
|
<style type="text/css">
|
||
|
<!--
|
||
|
a.summary-letter {text-decoration: none}
|
||
|
blockquote.smallquotation {font-size: smaller}
|
||
|
div.display {margin-left: 3.2em}
|
||
|
div.example {margin-left: 3.2em}
|
||
|
div.indentedblock {margin-left: 3.2em}
|
||
|
div.lisp {margin-left: 3.2em}
|
||
|
div.smalldisplay {margin-left: 3.2em}
|
||
|
div.smallexample {margin-left: 3.2em}
|
||
|
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
|
||
|
div.smalllisp {margin-left: 3.2em}
|
||
|
kbd {font-style:oblique}
|
||
|
pre.display {font-family: inherit}
|
||
|
pre.format {font-family: inherit}
|
||
|
pre.menu-comment {font-family: serif}
|
||
|
pre.menu-preformatted {font-family: serif}
|
||
|
pre.smalldisplay {font-family: inherit; font-size: smaller}
|
||
|
pre.smallexample {font-size: smaller}
|
||
|
pre.smallformat {font-family: inherit; font-size: smaller}
|
||
|
pre.smalllisp {font-size: smaller}
|
||
|
span.nocodebreak {white-space:nowrap}
|
||
|
span.nolinebreak {white-space:nowrap}
|
||
|
span.roman {font-family:serif; font-weight:normal}
|
||
|
span.sansserif {font-family:sans-serif; font-weight:normal}
|
||
|
ul.no-bullet {list-style: none}
|
||
|
-->
|
||
|
</style>
|
||
|
|
||
|
|
||
|
</head>
|
||
|
|
||
|
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
|
||
|
<a name="Character-sets"></a>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<a name="Character-sets-1"></a>
|
||
|
<h3 class="section">1.1 Character sets</h3>
|
||
|
|
||
|
<p>Source code character set processing in C and related languages is
|
||
|
rather complicated. The C standard discusses two character sets, but
|
||
|
there are really at least four.
|
||
|
</p>
|
||
|
<p>The files input to CPP might be in any character set at all. CPP’s
|
||
|
very first action, before it even looks for line boundaries, is to
|
||
|
convert the file into the character set it uses for internal
|
||
|
processing. That set is what the C standard calls the <em>source</em>
|
||
|
character set. It must be isomorphic with ISO 10646, also known as
|
||
|
Unicode. CPP uses the UTF-8 encoding of Unicode.
|
||
|
</p>
|
||
|
<p>The character sets of the input files are specified using the
|
||
|
<samp>-finput-charset=</samp> option.
|
||
|
</p>
|
||
|
<p>All preprocessing work (the subject of the rest of this manual) is
|
||
|
carried out in the source character set. If you request textual
|
||
|
output from the preprocessor with the <samp>-E</samp> option, it will be
|
||
|
in UTF-8.
|
||
|
</p>
|
||
|
<p>After preprocessing is complete, string and character constants are
|
||
|
converted again, into the <em>execution</em> character set. This
|
||
|
character set is under control of the user; the default is UTF-8,
|
||
|
matching the source character set. Wide string and character
|
||
|
constants have their own character set, which is not called out
|
||
|
specifically in the standard. Again, it is under control of the user.
|
||
|
The default is UTF-16 or UTF-32, whichever fits in the target’s
|
||
|
<code>wchar_t</code> type, in the target machine’s byte
|
||
|
order.<a name="DOCF1" href="#FOOT1"><sup>1</sup></a> Octal and hexadecimal escape sequences do not undergo
|
||
|
conversion; <tt>'\x12'</tt> has the value 0x12 regardless of the currently
|
||
|
selected execution character set. All other escapes are replaced by
|
||
|
the character in the source character set that they represent, then
|
||
|
converted to the execution character set, just like unescaped
|
||
|
characters.
|
||
|
</p>
|
||
|
<p>In identifiers, characters outside the ASCII range can only be
|
||
|
specified with the ‘<samp>\u</samp>’ and ‘<samp>\U</samp>’ escapes, not used
|
||
|
directly. If strict ISO C90 conformance is specified with an option
|
||
|
such as <samp>-std=c90</samp>, or <samp>-fno-extended-identifiers</samp> is
|
||
|
used, then those escapes are not permitted in identifiers.
|
||
|
</p>
|
||
|
<div class="footnote">
|
||
|
<hr>
|
||
|
<h4 class="footnotes-heading">Footnotes</h4>
|
||
|
|
||
|
<h3><a name="FOOT1" href="#DOCF1">(1)</a></h3>
|
||
|
<p>UTF-16 does not meet the requirements of the C
|
||
|
standard for a wide character set, but the choice of 16-bit
|
||
|
<code>wchar_t</code> is enshrined in some system ABIs so we cannot fix
|
||
|
this.</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
|
||
|
</body>
|
||
|
</html>
|