Status: WIP
Pre-requisites: git
and bazel
.
# Clone bazel-bench.
$ git clone https://github.com/bazelbuild/bazel-bench.git
$ cd bazel-bench
To do a test run, run the following command (if you're on Windows, populate
--data_directory
with an appropriate Windows-style path):
$ bazel run :benchmark \
-- \
--bazel_commits=b8468a6b68a405e1a5767894426d3ea9a1a2f22f,ad503849e78b98d762f03168de5a336904280150 \
--project_source=https://github.com/bazelbuild/rules_cc.git \
--data_directory=/tmp/bazel-bench-data \
--verbose \
-- build //:all
The Bazel commits might be too old and no longer buildable by your local Bazel. Replace them with the more recent commits from bazelbuild/bazel. The above command would print a result table on the terminal and outputs a csv
file to the specified --data_directory
.
Bazel-bench has the following syntax:
$ bazel run :benchmark -- <bazel-bench-flags> -- <args to pass to bazel binary>
For example, to benchmark the performance of 2 bazel commits A and B on the same
command bazel build --nobuild //:all
of rules_cc
project, you'd do:
$ bazel run :benchmark \
-- \
--bazel_commits=A,B \
--project_source=https://github.com/bazelbuild/rules_cc.git \
-- build --nobuild //:all
Note the double-dash --
before the command arguments. You can pass any
arguments that you would normally run on Bazel to the script. The performance of
commands other than build
can also be benchmarked e.g. query
, ...
The flag-based approach does not support cases where the benchmarked Bazel commands differ. The most common use case for this: As a rule developer, I want to verify the effect of my flag on Bazel performance. For that, we'd need the config-file interface. The example config file would look like this:
# config.yaml
global_options:
project_commit: 595a730
runs: 5
collect_profile: false
project_source: /path/to/project/repo
units:
- bazel_binary: /usr/bin/bazel
command: --startup_option1 build --nomy_flag //:all
- bazel_binary: /usr/bin/bazel
command: --startup_option2 build --my_flag //:all
To launch the benchmark:
$ bazel run :benchmark -- --benchmark_config=/absolute/path/to/config.yaml
The above config file would benchmark 2 "units". A unit is defined as a set of
conditions that describes a scenario to be benchmarked. This setup allows
maximum flexibility, as the conditions are independent between units. It's even
possible to benchmark a bazel_commit
against a pre-built bazel_binary
.
global_options
is the list of options applied to every units. These global options are overridden by local options.
For the list of currently supported flags/attributes and their default values, refer to utils/benchmark_config.py.
project_source
should be a global option, as we don't support benchmarking multiple projects in 1 benchmark. Though,project_commit
can differ between units.- Incremental benchmarks isn't available.
- Commands have to be in canonical form (next section).
Bazel arguments are parsed manually. It is important that the supplied arguments in the command line strictly follows the canonical form:
<command> <canonical options> <expressions>
Example of non-canonical command line arguments that could result in wrong interpretation:
GOOD: (correct order, options in canonical form)
build --nobuild --compilation_mode=opt //:all
BAD: (non-canonical options)
build --nobuild -c opt //:all
BAD: (wrong order)
build --nobuild //:all --compilation_mode=opt
To show all the available flags:
$ bazel run :benchmark -- --helpshort
Some useful flags are:
--bazel_binaries: The pre-built bazel binaries to benchmark.
(a comma separated list)
--bazel_commits: The commits at which bazel is built.
(default: 'latest')
(a comma separated list)
--bazel_source: Either a path to the local Bazel repo or a https url to a GitHub repository.
(default: 'https://github.com/bazelbuild/bazel.git')
--bazelrc: The path to a .bazelrc file.
--csv_file_name: The name of the output csv, without the .csv extension
--data_directory: The directory in which the csv files should be stored.
--[no]prefetch_ext_deps: Whether to do an initial run to pre-fetch external dependencies.
(default: 'true')
--project_commits: The commits from the git project to be benchmarked.
(default: 'latest')
(a comma separated list)
--project_source: Either a path to the local git project to be built or a https url to a GitHub repository.
--runs: The number of benchmark runs.
(default: '5')
(an integer)
--[no]verbose: Whether to include git/Bazel stdout logs.
(default: 'false')
--[no]collect_profile: Whether to collect JSON profile for each run.
Requires --data_directory to be set.
(default: 'false')
Bazel's JSON Profile
is a useful tool to investigate the performance of Bazel. You can configure
bazel-bench
to export these JSON profiles on runs using the
--collect_profile
flag.
For each pair of project_commit
and bazel_commit
, we produce a couple JSON
profiles, based on the number of runs. To have a better overview of the
performance of each phase and events, we can aggregate these profiles and
produce the median duration of each event across them.
To run the tool:
bazel run utils:json_profiles_merger \
-- \
--bazel_source=<some commit or path> \
--project_source=<some url or path> \
--project_commit=<some_commit> \
--output_path=/tmp/outfile.csv \
-- /tmp/my_json_profiles_*.profile
You can pass the pattern that selects the input profiles into the positional
argument of the script, like in the above example
(/tmp/my_json_profiles_*.profile
).
By default, bazel-bench will store the measurement results and other required
files (project clones, built binaries, ...) under the ~/.bazel-bench
directory.
The layout is:
~/.bazel-bench/ <= The root of bazel-bench's output dir.
bazel/ <= Where bazel's repository is cloned.
bazel-bin/ <= Where the built bazel binaries are stored.
fba9a2c87ee9589d72889caf082f1029/ <= The bazel commit hash.
bazel <= The actual bazel binary.
project-clones/ <= Where the projects' repositories are cloned.
7ffd56a6e4cb724ea575aba15733d113/ <= Each project is stored under a project hash,
computed from its source.
out/ <= This is the default output root. But
the output root can also be set via --data_directory.
To clear the caches, simply rm -rf
where necessary.
As an important part of our bazel-bench daily pipeline, we upload the csv output files to BigQuery and Storage, using separate targets.
To upload the output to BigQuery & Storage you'll need the GCP credentials and the table details. Please contact [email protected].
BigQuery:
bazel run utils:bigquery_upload \
-- \
--upload_to_bigquery=<project_id>:<dataset_id>:<table_id>:<location> \
-- \
<file1> <file2> ...
Storage:
bazel run utils:storage_upload \
-- \
--upload_to_storage=<project_id>:<bucket_id>:<subdirectory> \
-- \
<file1> <file2> ...
We generate a performance report with BazelCI. The generator script can be found
under the /report
directory.
Example Usage: $ python3 report/generate_report.py --date=2019-01-01 --project=dummy --storage_bucket=dummy_bucket
For more detailed usage information, run: $ python3 report/generate_report.py --help
The tests for each module are found in the same directory. To run the test, simply:
$ bazel test ...