Skip to content

Commit

Permalink
Initial manual commit of documentation
Browse files Browse the repository at this point in the history
Signed-off-by: John Pennycook <[email protected]>
  • Loading branch information
Pennycook committed Mar 27, 2024
0 parents commit 7deaef0
Show file tree
Hide file tree
Showing 47 changed files with 7,630 additions and 0 deletions.
Binary file not shown.
Binary file added _images/example-dendrogram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/specialization-tree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
110 changes: 110 additions & 0 deletions _sources/analysis.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
Performing Analysis
===================

The main interface of CBI is the ``codebasin`` script, which can be invoked to
analyze a code base and produce various reports. Although CBI ships with other
interfaces specialized for certain use-cases, ``codebasin`` supports an
end-to-end workflow that should be preferred for general usage.

The simplest way to invoke ``codebasin`` is as shown below::

$ codebasin analysis.toml

...but what is ``analysis.toml``? We need to use this file to tell CBI which
files are part of the code base, and where it should look to find the
compilation databases defining our platforms.

.. note::

The TOML file can have any name, but we'll use "analysis.toml" throughout
this tutorial.


Defining Platforms
##################

Each platform definition is a TOML `table`_, of the form shown below:

.. _`table`: https://toml.io/en/v1.0.0#table

.. code-block:: toml
[platform.name]
commands = "/path/to/compile_commands.json"
The table's name is the name of the platform, and we can use any meaningful
string. The ``commands`` key tells CBI where to find the compilation database
for this platform.

In our example, we have two platforms that we're calling "cpu" and "gpu",
and our build directories are called ``build-cpu`` and ``build-gpu``, so
our platform definitions should look like this:

.. code-block:: toml
[platform.cpu]
commands = "build-cpu/compile_commands.json"
[platform.gpu]
commands = "build-gpu/compile_commands.json"
.. warning::
Platform names are case sensitive! The names "cpu" and "CPU" would refer to
two different platforms.


Running ``codebasin``
#####################

Running ``codebasin`` with this analysis file gives the following output:

.. code-block:: text
:emphasize-lines: 4,5,6,7,9
-----------------------
Platform Set LOC % LOC
-----------------------
{} 2 6.06
{cpu} 7 21.21
{gpu} 7 21.21
{cpu, gpu} 17 51.52
-----------------------
Code Divergence: 0.45
Unused Code (%): 6.06
Total SLOC: 33
Distance Matrix
--------------
cpu gpu
--------------
cpu 0.00 0.45
gpu 0.45 0.00
The results show that there are 2 lines of code that are unused by any
platform, 7 lines of code used only by the CPU compilation, 7 lines of code
used only by the GPU compilation, and 17 lines of code shared by both
platforms. Plugging these numbers into the equation for code divergence gives
0.45.


Filtering Platforms
###################

When working with an application that supports lots of platforms, we may want
to limit the analysis to a subset of the platforms defined in the analysis
file.

Rather than require a separate analysis file for each possible subset, we can
use the :code:`--platform` flag (or :code:`-p` flag) to specify the subset of
interest on the command line:

.. code:: sh
$ codebasin -p [PLATFORM 1] -p [PLATFORM 2] analysis.toml
For example, we can limit the analysis of our sample code base to the cpu
platform as follows:

.. code:: sh
$ codebasin -p cpu analysis.toml
42 changes: 42 additions & 0 deletions _sources/cmd.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Command Line Interface
======================

.. code-block:: text
codebasin [-h] [--version] [-v] [-q] [-R <report>] [-x <pattern>] [-p <platform>] [<analysis-file>]
**positional arguments:**

``analysis-file``
TOML file describing the analysis to be performed,
including the codebase and platform descriptions.

**options:**

``-h, --help``
Show help message and exit.

``--version``
Display version information and exit.

``-v, --verbose``
Increase verbosity level.

``-q, --quiet``
Decrease verbosity level.

``-R <report>``
Generate a report of the specified type.

- ``summary``: output only code divergence information.
- ``clustering``: output only distance matrix and dendrogram.
- ``all``: generate both summary and clustering reports.

``-x <pattern>, --exclude <pattern>``
Exclude files matching this pattern from the code base.
May be specified multiple times.

``-p <platform>, --platform <platform>``
Include the specified platform in the analysis.
May be specified multiple times.
If not specified, all platforms will be included.
130 changes: 130 additions & 0 deletions _sources/compilation-databases.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
Compilation Databases
=====================

Before it can analyze a code base, CBI needs to know how each source file is
compiled. Just like a compiler, CBI requires a full list of include paths,
macro definitions and other options in order to identify which code is used
by each platform. Rather than require all of this information to be specified
manually, CBI reads it from a `compilation database`_.


Generating a Compilation Database
#################################

Since our sample code base is already set up with a ``CMakeLists.txt`` file, we
can ask CMake to generate the compilation database for us with the
:code:`CMAKE_EXPORT_COMPILE_COMMANDS` option:

.. code-block:: cmake
:emphasize-lines: 4
cmake_minimum_required(VERSION 3.5)
project(tutorial)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(SOURCES main.cpp third-party/library.cpp)
option(GPU_OFFLOAD "Enable GPU offload." OFF)
if (GPU_OFFLOAD)
add_definitions("-D GPU_OFFLOAD=1")
list(APPEND SOURCES gpu/foo.cpp)
else()
list(APPEND SOURCES cpu/foo.cpp)
endif()
add_executable(tutorial ${SOURCES})
.. important::
For projects that don't use CMake, we can use `Bear`_ to intercept the
commands generated by other build systems (such as GNU makefiles). Other
build systems and tools that produce compilation databases should also be
compatible.

.. _`compilation database`: https://clang.llvm.org/docs/JSONCompilationDatabase.html
.. _`Bear`: https://github.com/rizsotto/Bear


CPU Compilation Commands
------------------------

Let's start by running CMake without the :code:`GPU_OFFLOAD` option enabled, to
obtain a compilation database for the CPU:

.. code :: sh
$ mkdir build-cpu
$ cmake ../
$ ls
CMakeCache.txt CMakeFiles Makefile cmake_install.cmake compile_commands.json
This :code:`compile_commands.json` file includes all the commands required to
build the code, corresponding to the commands that would be executed if we were
to actually run :code:`make`.

.. attention::
CMake generates compilation databases when the ``cmake`` command is
executed, allowing us to generate compilation databases without also
building the application. Other tools (like Bear) may require a build.

In this case, it contains:

.. code :: json
[
{
"directory": "/home/username/src/build-cpu",
"command": "/usr/bin/c++ -o CMakeFiles/tutorial.dir/main.cpp.o -c /home/username/src/main.cpp",
"file": "/home/username/src/main.cpp"
},
{
"directory": "/home/username/src/build-cpu",
"command": "/usr/bin/c++ -o CMakeFiles/tutorial.dir/third-party/library.cpp.o -c /home/username/src/third-party/library.cpp",
"file": "/home/username/src/third-party/library.cpp"
},
{
"directory": "/home/username/src/build-cpu",
"command": "/usr/bin/c++ -o CMakeFiles/tutorial.dir/cpu/foo.cpp.o -c /home/username/src/cpu/foo.cpp",
"file": "/home/username/src/cpu/foo.cpp"
}
]
GPU Compilation Commands
------------------------

Repeating the exercise with :code:`GPU_OFFLOAD` enabled gives us a different
compilation database for the GPU.

.. warning::
The ``GPU_OFFLOAD`` option is specific to this ``CMakeLists.txt`` file, and
isn't something provided by CMake. Understanding how to build an application
for a specific target platform is beyond the scope of this tutorial.

As expected, we can see that the compilation database refers to ``gpu.cpp``
instead of ``cpu.cpp``, and that the ``GPU_OFFLOAD`` macro is defined as part
of each compilation command:

.. code :: json
[
{
"directory": "/home/username/src/build-gpu",
"command": "/usr/bin/c++ -D GPU_OFFLOAD=1 -o CMakeFiles/tutorial.dir/main.cpp.o -c /home/username/src/main.cpp",
"file": "/home/username/src/main.cpp"
},
{
"directory": "/home/username/src/build-gpu",
"command": "/usr/bin/c++ -D GPU_OFFLOAD=1 -o CMakeFiles/tutorial.dir/third-party/library.cpp.o -c /home/username/src/third-party/library.cpp",
"file": "/home/username/src/third-party/library.cpp"
},
{
"directory": "/home/username/src/build-gpu",
"command": "/usr/bin/c++ -D GPU_OFFLOAD=1 -o CMakeFiles/tutorial.dir/gpu/foo.cpp.o -c /home/username/src/gpu/foo.cpp",
"file": "/home/username/src/gpu/foo.cpp"
}
]
These differences are the result of code divergence. We'll explore how to use
``codebasin`` to measure the *amount* of code divergence in a later tutorial.
Loading

0 comments on commit 7deaef0

Please sign in to comment.