190 lines
9.4 KiB
HTML
190 lines
9.4 KiB
HTML
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||
|
<html>
|
||
|
<!-- This file documents the gprof profiler of the GNU system.
|
||
|
|
||
|
Copyright (C) 1988-2016 Free Software Foundation, Inc.
|
||
|
|
||
|
Permission is granted to copy, distribute and/or modify this document
|
||
|
under the terms of the GNU Free Documentation License, Version 1.3
|
||
|
or any later version published by the Free Software Foundation;
|
||
|
with no Invariant Sections, with no Front-Cover Texts, and with no
|
||
|
Back-Cover Texts. A copy of the license is included in the
|
||
|
section entitled "GNU Free Documentation License".
|
||
|
-->
|
||
|
<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
|
||
|
<head>
|
||
|
<title>GNU gprof: Implementation</title>
|
||
|
|
||
|
<meta name="description" content="GNU gprof: Implementation">
|
||
|
<meta name="keywords" content="GNU gprof: Implementation">
|
||
|
<meta name="resource-type" content="document">
|
||
|
<meta name="distribution" content="global">
|
||
|
<meta name="Generator" content="makeinfo">
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||
|
<link href="index.html#Top" rel="start" title="Top">
|
||
|
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||
|
<link href="Details.html#Details" rel="up" title="Details">
|
||
|
<link href="File-Format.html#File-Format" rel="next" title="File Format">
|
||
|
<link href="Details.html#Details" rel="prev" title="Details">
|
||
|
<style type="text/css">
|
||
|
<!--
|
||
|
a.summary-letter {text-decoration: none}
|
||
|
blockquote.smallquotation {font-size: smaller}
|
||
|
div.display {margin-left: 3.2em}
|
||
|
div.example {margin-left: 3.2em}
|
||
|
div.indentedblock {margin-left: 3.2em}
|
||
|
div.lisp {margin-left: 3.2em}
|
||
|
div.smalldisplay {margin-left: 3.2em}
|
||
|
div.smallexample {margin-left: 3.2em}
|
||
|
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
|
||
|
div.smalllisp {margin-left: 3.2em}
|
||
|
kbd {font-style:oblique}
|
||
|
pre.display {font-family: inherit}
|
||
|
pre.format {font-family: inherit}
|
||
|
pre.menu-comment {font-family: serif}
|
||
|
pre.menu-preformatted {font-family: serif}
|
||
|
pre.smalldisplay {font-family: inherit; font-size: smaller}
|
||
|
pre.smallexample {font-size: smaller}
|
||
|
pre.smallformat {font-family: inherit; font-size: smaller}
|
||
|
pre.smalllisp {font-size: smaller}
|
||
|
span.nocodebreak {white-space:nowrap}
|
||
|
span.nolinebreak {white-space:nowrap}
|
||
|
span.roman {font-family:serif; font-weight:normal}
|
||
|
span.sansserif {font-family:sans-serif; font-weight:normal}
|
||
|
ul.no-bullet {list-style: none}
|
||
|
-->
|
||
|
</style>
|
||
|
|
||
|
|
||
|
</head>
|
||
|
|
||
|
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
|
||
|
<a name="Implementation"></a>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="File-Format.html#File-Format" accesskey="n" rel="next">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<a name="Implementation-of-Profiling"></a>
|
||
|
<h3 class="section">9.1 Implementation of Profiling</h3>
|
||
|
|
||
|
<p>Profiling works by changing how every function in your program is compiled
|
||
|
so that when it is called, it will stash away some information about where
|
||
|
it was called from. From this, the profiler can figure out what function
|
||
|
called it, and can count how many times it was called. This change is made
|
||
|
by the compiler when your program is compiled with the ‘<samp>-pg</samp>’ option,
|
||
|
which causes every function to call <code>mcount</code>
|
||
|
(or <code>_mcount</code>, or <code>__mcount</code>, depending on the OS and compiler)
|
||
|
as one of its first operations.
|
||
|
</p>
|
||
|
<p>The <code>mcount</code> routine, included in the profiling library,
|
||
|
is responsible for recording in an in-memory call graph table
|
||
|
both its parent routine (the child) and its parent’s parent. This is
|
||
|
typically done by examining the stack frame to find both
|
||
|
the address of the child, and the return address in the original parent.
|
||
|
Since this is a very machine-dependent operation, <code>mcount</code>
|
||
|
itself is typically a short assembly-language stub routine
|
||
|
that extracts the required
|
||
|
information, and then calls <code>__mcount_internal</code>
|
||
|
(a normal C function) with two arguments—<code>frompc</code> and <code>selfpc</code>.
|
||
|
<code>__mcount_internal</code> is responsible for maintaining
|
||
|
the in-memory call graph, which records <code>frompc</code>, <code>selfpc</code>,
|
||
|
and the number of times each of these call arcs was traversed.
|
||
|
</p>
|
||
|
<p>GCC Version 2 provides a magical function (<code>__builtin_return_address</code>),
|
||
|
which allows a generic <code>mcount</code> function to extract the
|
||
|
required information from the stack frame. However, on some
|
||
|
architectures, most notably the SPARC, using this builtin can be
|
||
|
very computationally expensive, and an assembly language version
|
||
|
of <code>mcount</code> is used for performance reasons.
|
||
|
</p>
|
||
|
<p>Number-of-calls information for library routines is collected by using a
|
||
|
special version of the C library. The programs in it are the same as in
|
||
|
the usual C library, but they were compiled with ‘<samp>-pg</samp>’. If you
|
||
|
link your program with ‘<samp>gcc … -pg</samp>’, it automatically uses the
|
||
|
profiling version of the library.
|
||
|
</p>
|
||
|
<p>Profiling also involves watching your program as it runs, and keeping a
|
||
|
histogram of where the program counter happens to be every now and then.
|
||
|
Typically the program counter is looked at around 100 times per second of
|
||
|
run time, but the exact frequency may vary from system to system.
|
||
|
</p>
|
||
|
<p>This is done is one of two ways. Most UNIX-like operating systems
|
||
|
provide a <code>profil()</code> system call, which registers a memory
|
||
|
array with the kernel, along with a scale
|
||
|
factor that determines how the program’s address space maps
|
||
|
into the array.
|
||
|
Typical scaling values cause every 2 to 8 bytes of address space
|
||
|
to map into a single array slot.
|
||
|
On every tick of the system clock
|
||
|
(assuming the profiled program is running), the value of the
|
||
|
program counter is examined and the corresponding slot in
|
||
|
the memory array is incremented. Since this is done in the kernel,
|
||
|
which had to interrupt the process anyway to handle the clock
|
||
|
interrupt, very little additional system overhead is required.
|
||
|
</p>
|
||
|
<p>However, some operating systems, most notably Linux 2.0 (and earlier),
|
||
|
do not provide a <code>profil()</code> system call. On such a system,
|
||
|
arrangements are made for the kernel to periodically deliver
|
||
|
a signal to the process (typically via <code>setitimer()</code>),
|
||
|
which then performs the same operation of examining the
|
||
|
program counter and incrementing a slot in the memory array.
|
||
|
Since this method requires a signal to be delivered to
|
||
|
user space every time a sample is taken, it uses considerably
|
||
|
more overhead than kernel-based profiling. Also, due to the
|
||
|
added delay required to deliver the signal, this method is
|
||
|
less accurate as well.
|
||
|
</p>
|
||
|
<p>A special startup routine allocates memory for the histogram and
|
||
|
either calls <code>profil()</code> or sets up
|
||
|
a clock signal handler.
|
||
|
This routine (<code>monstartup</code>) can be invoked in several ways.
|
||
|
On Linux systems, a special profiling startup file <code>gcrt0.o</code>,
|
||
|
which invokes <code>monstartup</code> before <code>main</code>,
|
||
|
is used instead of the default <code>crt0.o</code>.
|
||
|
Use of this special startup file is one of the effects
|
||
|
of using ‘<samp>gcc … -pg</samp>’ to link.
|
||
|
On SPARC systems, no special startup files are used.
|
||
|
Rather, the <code>mcount</code> routine, when it is invoked for
|
||
|
the first time (typically when <code>main</code> is called),
|
||
|
calls <code>monstartup</code>.
|
||
|
</p>
|
||
|
<p>If the compiler’s ‘<samp>-a</samp>’ option was used, basic-block counting
|
||
|
is also enabled. Each object file is then compiled with a static array
|
||
|
of counts, initially zero.
|
||
|
In the executable code, every time a new basic-block begins
|
||
|
(i.e., when an <code>if</code> statement appears), an extra instruction
|
||
|
is inserted to increment the corresponding count in the array.
|
||
|
At compile time, a paired array was constructed that recorded
|
||
|
the starting address of each basic-block. Taken together,
|
||
|
the two arrays record the starting address of every basic-block,
|
||
|
along with the number of times it was executed.
|
||
|
</p>
|
||
|
<p>The profiling library also includes a function (<code>mcleanup</code>) which is
|
||
|
typically registered using <code>atexit()</code> to be called as the
|
||
|
program exits, and is responsible for writing the file <samp>gmon.out</samp>.
|
||
|
Profiling is turned off, various headers are output, and the histogram
|
||
|
is written, followed by the call-graph arcs and the basic-block counts.
|
||
|
</p>
|
||
|
<p>The output from <code>gprof</code> gives no indication of parts of your program that
|
||
|
are limited by I/O or swapping bandwidth. This is because samples of the
|
||
|
program counter are taken at fixed intervals of the program’s run time.
|
||
|
Therefore, the
|
||
|
time measurements in <code>gprof</code> output say nothing about time that your
|
||
|
program was not running. For example, a part of the program that creates
|
||
|
so much data that it cannot all fit in physical memory at once may run very
|
||
|
slowly due to thrashing, but <code>gprof</code> will say it uses little time. On
|
||
|
the other hand, sampling by run time has the advantage that the amount of
|
||
|
load due to other users won’t directly affect the output you get.
|
||
|
</p>
|
||
|
<hr>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="File-Format.html#File-Format" accesskey="n" rel="next">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
|
||
|
</body>
|
||
|
</html>
|