If the program you are debugging uses a different character set to represent characters and strings than the one gdb uses itself, gdb can automatically translate between the character sets for you. The character set gdb uses we call the host character set; the one the inferior program uses we call the target character set.
For example, if you are running gdb on a gnu/Linux system, which
uses the ISO Latin 1 character set, but you are using gdb's
remote protocol (see Remote Debugging) to debug a program
running on an IBM mainframe, which uses the ebcdic character set,
then the host character set is Latin-1, and the target character set is
ebcdic. If you give gdb the command set
target-charset EBCDIC-US
, then gdb translates between
ebcdic and Latin 1 as you print character or string values, or use
character and string literals in expressions.
gdb has no way to automatically recognize which character set
the inferior program uses; you must tell it, using the set
target-charset
command, described below.
Here are the commands for controlling gdb's character set support:
set target-charset
charsetset host-charset
charsetBy default, gdb uses a host character set appropriate to the
system it is running on; you can override that default using the
set host-charset
command. On some systems, gdb cannot
automatically determine the appropriate host character set. In this
case, gdb uses ‘UTF-8’.
gdb can only use certain character sets as its host character
set. If you type set host-charset <TAB><TAB>,
gdb will list the host character sets it supports.
set charset
charsetshow charset
show host-charset
show target-charset
set target-wide-charset
charsetwchar_t
type. To
display the list of supported wide character sets, type
set target-wide-charset <TAB><TAB>.
show target-wide-charset
Here is an example of gdb's character set support in action. Assume that the following source code has been placed in the file charset-test.c:
#include <stdio.h> char ascii_hello[] = {72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100, 33, 10, 0}; char ibm1047_hello[] = {200, 133, 147, 147, 150, 107, 64, 166, 150, 153, 147, 132, 90, 37, 0}; main () { printf ("Hello, world!\n"); }
In this program, ascii_hello
and ibm1047_hello
are arrays
containing the string ‘Hello, world!’ followed by a newline,
encoded in the ascii and ibm1047 character sets.
We compile the program, and invoke the debugger on it:
$ gcc -g charset-test.c -o charset-test $ gdb -nw charset-test GNU gdb 2001-12-19-cvs Copyright 2001 Free Software Foundation, Inc. ... (gdb)
We can use the show charset
command to see what character sets
gdb is currently using to interpret and display characters and
strings:
(gdb) show charset The current host and target character set is `ISO-8859-1'. (gdb)
For the sake of printing this manual, let's use ascii as our initial character set:
(gdb) set charset ASCII (gdb) show charset The current host and target character set is `ASCII'. (gdb)
Let's assume that ascii is indeed the correct character set for our
host system — in other words, let's assume that if gdb prints
characters using the ascii character set, our terminal will display
them properly. Since our current target character set is also
ascii, the contents of ascii_hello
print legibly:
(gdb) print ascii_hello $1 = 0x401698 "Hello, world!\n" (gdb) print ascii_hello[0] $2 = 72 'H' (gdb)
gdb uses the target character set for character and string literals you use in expressions:
(gdb) print '+' $3 = 43 '+' (gdb)
The ascii character set uses the number 43 to encode the ‘+’ character.
gdb relies on the user to tell it which character set the
target program uses. If we print ibm1047_hello
while our target
character set is still ascii, we get jibberish:
(gdb) print ibm1047_hello $4 = 0x4016a8 "\310\205\223\223\226k@\246\226\231\223\204Z%" (gdb) print ibm1047_hello[0] $5 = 200 '\310' (gdb)
If we invoke the set target-charset
followed by <TAB><TAB>,
gdb tells us the character sets it supports:
(gdb) set target-charset ASCII EBCDIC-US IBM1047 ISO-8859-1 (gdb) set target-charset
We can select ibm1047 as our target character set, and examine the
program's strings again. Now the ascii string is wrong, but
gdb translates the contents of ibm1047_hello
from the
target character set, ibm1047, to the host character set,
ascii, and they display correctly:
(gdb) set target-charset IBM1047 (gdb) show charset The current host character set is `ASCII'. The current target character set is `IBM1047'. (gdb) print ascii_hello $6 = 0x401698 "\110\145%%?\054\040\167?\162%\144\041\012" (gdb) print ascii_hello[0] $7 = 72 '\110' (gdb) print ibm1047_hello $8 = 0x4016a8 "Hello, world!\n" (gdb) print ibm1047_hello[0] $9 = 200 'H' (gdb)
As above, gdb uses the target character set for character and string literals you use in expressions:
(gdb) print '+' $10 = 78 '+' (gdb)
The ibm1047 character set uses the number 78 to encode the ‘+’ character.