-
Notifications
You must be signed in to change notification settings - Fork 5
/
performance.empy
172 lines (156 loc) · 5.13 KB
/
performance.empy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
@# -*- html -*-
@{page_id = "performance"}
@{page_title = "Performance"}
@empy.include("header.empy")
<p>The performance of ccache depends on a lot of factors, which makes it quite
hard to predict the improvement for a given use case. This page contains some
different performance measurements that try to give an idea about the potential
speedup.</p>
<p>
It should also be noted that if the expected hit rate is low, there may be a
net performance loss when using ccache because of the overhead of cache misses
(typically 5%-20%, but just 1%-3% with depend mode enabled). Also, if the
build machine is short on memory compared to the amount of memory used by the
build tools (compiler, linker, etc), usage of ccache could decrease
performance due the fact that ccache's cached files may flush other files from
the OS's disk cache. See
<a href="http://www.mail-archive.com/ccache@@lists.samba.org/msg00576.html">this
mailing list post</a> by Christopher Tate for a good write-up on this issue.
So to sum it up: it is probably wise to perform some measurements with and
without ccache for your typical use case before enabling it!
</p>
<p>
The following measurements were made on a fairly standard Linux-based desktop
system with an Intel Core i5-4690K and a standard SSD disk.
</p>
<p>“ccache 3.7.1 direct” in the tables below means running ccache
with direct mode enabled (which is the default) and “ccache 3.7.1
prepr.” means running ccache with a disabled depend mode, thus falling
back to the preprocessor mode. “ccache 3.7.1 depend” means running
ccache with the depend mode enabled.</p>
<p>
All results were gathered by timing 100 individual compilations and picking
the median result.
</p>
<h2>ccache.c</h2>
<p>
Here are the results of building ccache's own
<a href="https://github.com/ccache/ccache/blob/v3.7.1/src/ccache.c">ccache.c</a>
with <code>-g -O2 -MD</code> and needed <code>-I</code> flags:
</p>
<table class="perf">
<tr>
<th class="empty"></th>
<th>Elapsed time</th>
<th>Percent</th>
<th>Factor</th>
</tr>
<tr>
<th>Without ccache</th>
<td class="num">0.6988 s</td>
<td class="num">100.00 %</td>
<td class="num">1.00 x</td>
</tr>
<tr>
<th>ccache 3.7.1 prepr., first time</th>
<td class="num">0.7251 s</td>
<td class="num">103.77 %</td>
<td class="num">0.96 x</td>
</tr>
<tr>
<th>ccache 3.7.1 prepr., second time</th>
<td class="num">0.0247 s</td>
<td class="num">3.53 %</td>
<td class="num">28.33 x</td>
</tr>
<tr>
<th>ccache 3.7.1 direct, first time</th>
<td class="num">0.7268 s</td>
<td class="num">104.01 %</td>
<td class="num">0.96 x</td>
</tr>
<tr>
<th>ccache 3.7.1 direct, second time</th>
<td class="num">0.0048 s</td>
<td class="num">0.69 %</td>
<td class="num">145.39 x</td>
</tr>
<tr>
<th>ccache 3.7.1 depend, first time</th>
<td class="num">0.7102 s</td>
<td class="num">101.64 %</td>
<td class="num">0.98 x</td>
</tr>
<tr>
<th>ccache 3.7.1 depend, second time</th>
<td class="num">0.0051 s</td>
<td class="num">0.73 %</td>
<td class="num">137.81 x</td>
</tr>
</table>
<p>As can be seen above, cache hits in the direct mode are about 5 times faster
than in the preprocessor mode. The speedup compared to compiling without ccache
is very large since the compilation costs a relatively large amount of CPU
time. The overhead of cache misses can also be seen, but it's smaller for the
depend mode.</p>
<h2>c++_includes.cc</h2>
<p>This is a test that aims to measure preprocessor-intensive compilation.
Here, <a href="c++_includes.cc">c++_includes.cc</a> (a file including nine
common include files from the C++ standard library) was compiled without any
special flags other than <code>-MD</code> to enable usage of the depend
mode:</p>
<table class="perf">
<tr>
<th class="empty"></th>
<th>Elapsed time</th>
<th>Percent</th>
<th>Factor</th>
</tr>
<tr>
<th>Without ccache</th>
<td class="num">0.2421 s</td>
<td class="num">100.00 %</td>
<td class="num">1.00 x</td>
</tr>
<tr>
<th>ccache 3.7.1 prepr., first time</th>
<td class="num">0.2892 s</td>
<td class="num">119.46 %</td>
<td class="num">0.84 x</td>
</tr>
<tr>
<th>ccache 3.7.1 prepr., second time</th>
<td class="num">0.0496 s</td>
<td class="num">19.65 %</td>
<td class="num">5.09 x</td>
</tr>
<tr>
<th>ccache 3.7.1 direct, first time</th>
<td class="num">0.2919 s</td>
<td class="num">120.58 %</td>
<td class="num">0.83 x</td>
</tr>
<tr>
<th>ccache 3.7.1 direct, second time</th>
<td class="num">0.0083 s</td>
<td class="num">3.35 %</td>
<td class="num">29.10 x</td>
</tr>
<tr>
<th>ccache 3.7.1 depend, first time</th>
<td class="num">0.2495 s</td>
<td class="num">103.08 %</td>
<td class="num">0.97 x</td>
</tr>
<tr>
<th>ccache 3.7.1 depend, second time</th>
<td class="num">0.0083 s</td>
<td class="num">3.41 %</td>
<td class="num">29.32 x</td>
</tr>
</table>
<p>The difference between direct and preprocessor mode hits is about a factor 6
— slightly higher than for the ccache.c test because the preprocessor
overhead is higher. The depend mode really shines here since it avoids making
costly preprocessor calls.</p>
@empy.include("footer.empy")