CPAchecker provides a command-line flag -benchmark
that should be used for all kinds of benchmarks.
It improves performance by disabling assertions and optional features
such as extended statistics and output files.
Note that you can re-enable specific output files individually if desired.
In addition, care needs to be taken to specify meaningful resource limits and some other options depending on the specifics of the verification task, as described in the following.
For benchmarking, it is recommended to use
BenchExec.
An extended version of BenchExec is bundled with CPAchecker
and usable by calling scripts/benchmark.py
.
This script should run with any Python version >= 3.5.
This script takes as input one or more XML files which specify the tool,
the resource limits, the tool configuration, the input programs
and the columns that should appear in the output table.
The script puts all the results into the test/results/ directory.
Commented examples for these XML files are given in the BenchExec
documentation.
The file ./examples/benchmark-cpachecker.xml
can be used as base for own benchmarks.
Several useful benchmark configurations are in test/test-sets/*.xml
.
For further help on the benchmark script use scripts/benchmark.py -h
.
If the benchmark script should evaluate whether the returned results
are correct or wrong and calculate scores like in the SV-COMP,
there needs to be a specification defined in the benchmark definition
that tells the script what should be checked (error label reachability,
memory safety, termination, etc.).
The specification is given in a property file (.prp
) just like in the SV-COMP.
There are examples for such files in test/programs/benchmarks/property*.prp
.
To specify them in the benchmark, use the <propertyfiles>
tag.
It can occur either in global scope or inside a <tasks>
tag.
We provide common program properties in config/properties
.
You do not need to specify a separate command-line option
with the specification for the verifier,
as the script will automatically add it to the command line.
It is important to specify the correct values for the following options according to the characteristics of the input files to avoid wrong answers:
-
Machine model (32 vs. 64 bit): 32-bit model is assumed by default. For 64 bit, specify
-64
on the command line. -
Whether
malloc
may return null or not: If the program assumesmalloc
never returns null and you are using a predicate-based analysis, setcpa.predicate.memoryAllocationsAlwaysSucceed=true
. Note that this assumption is true for all SV-Comp files, thus this option is already set in all-svcomp*
configurations.
When benchmarking CPAchecker, it is important to correctly specify the time and memory limits.
-
CPU Time. Firstly, in order to get statistics even in case of a timeout, is is important to specify different "soft" and "hard" CPU-time limits like in this example:
<benchmark timelimit="900s" hardtimelimit="1000s" ...
The soft time limit is automatically passed as parameter to CPAchecker, so there is no need to specify the-timelimit
option manually. -
Memory. The memory limit is specified in SI units (i.e., 1 MB = 1,000,000 Bytes) with the attribute
memlimit
in the<benchmark>
tag of the benchmark definition XML file. Example:<benchmark ... memlimit="8000 MB">
This limit will be enforced by the OS and CPAchecker will be killed if it needs more memory.
Additionally, it is important to specify the amount of memory
that Java uses for its own heap with the -heap
command-line parameter.
This value needs to be lower than the external limit.
Setting it too low will hurt the performance due to increased garbage collection
and provoke OutOfMemoryError
,
setting it too high limits the memory that is available to native libraries
such as MathSAT.
For analyses without MathSAT,
start to experiment with 1000 MB less than the external limit.
IMPORTANT: Java does not use SI units here, but IEC units (factor 1024).
7000M
here are 5% more than 7000 MB
for the memory limit above!
Example:
<option name="-heap">7000M</option>
Summary: For correct and useful benchmarks, choose a memory limit (e.g., 8000MB), a Java heap size (e.g., 7000MiB), and a timelimit (e.g., 900s). Then specify them as follows:
<benchmark ... timelimit="900s" hardtimelimit="1000s" memlimit="8000 MB">
<option name="-heap">7000M</option>
...
</benchmark>
In order to combine the results from several benchmark runs into a single table the script scripts/table-generator.py can be used. It also takes an XML file as input that gives the table configuration and is described in the table-generator documentation.
The output will be tables in HTML and CSV format, the former having some additional features like showing the log files and being able to plot the results.