Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Before You Start

Matt Butrovich edited this page Aug 21, 2018 · 4 revisions

Prologue

This page contains information that is intended to help you start on this project. It covers a wide set of topics related to software development, like development environment, debugging, version control etc. We are only attempting to provide biased suggestions that we think are invaluable for low-level software development. Feel free to deviate from them in case you are comfortable with other tools that solve similar problems.

Operating System

We prefer Linux (specifically Ubuntu) and macOS for development and testing. You can use any other OS for development as long as your modified source code can be built and run in those environments by our continuous integration system.

Language

We are developing Terrier in C++. In particular, we are following the latest C++17 standard. C++ provides a lot of leeway in DBMS development compared to other high-level languages. For instance, it supports both manual and automated memory management, varied styles of programming, stronger type checking, different kinds of polymorphism etc.

Here's a list of useful references :

  • CPP Reference is an online reference of the powerful Standard Template Library (STL).
  • C++ FAQ covers a lots of topics.

Here's a list of modern features that you might want to make use of:

  • auto type inference
  • Range-based for loops
  • Smart pointers, in particular unique_ptr.
  • STL data structures, such as unordered_set, unordered_map, etc.
  • Threads, deleted member functions, lambdas, etc.

Comments, Formatting, and Libraries

Please comment your code. Comment all the class definitions, non-trivial member functions and variables, and all the steps in your algorithms. We use Doxygen style comments and will reject commits that don't fill out comments.

We generally follow the Google C++ style guide. As they mention in that guide, these rules exist to keep the code base manageable while still allowing coders to use C++ language features productively. Make sure that you follow the naming rules. For instance, use class UpperCaseCamelCase for type names, int lower_case_with_underscores for variable/method/function names.

Please refrain from using any libraries other than the STL (and googletest for unit testing) without contacting us.

Directory Structure

Organize source code files into relevant folders based on their functionality. Separate binary files from source files, and production code from testing code.

In general, the src directory contains all the production code, the test directory contains all the testing code, and the benchmark directory contains all the benchmark code. Within the src directory, src/storage contains files related to our storage engine, while src/index contains files related to the indexes, etc..

Source Code Management

We exclusively use git for source code management, and GitHub for collaboration. Here's a simple guide for using it. If you have never used a tool like git, I am sure that you will soon wonder how you managed to live without it. In case you want to learn more, here's a free book.

Please update or add a .gitignore file to exclude unwanted files (e.g. the build directory with binary files, backup/temporary files of your editor/IDE, large files containing test data, etc.) from the repository.

Build System

We use cmake for builds. In particular, we use cmake for automatically generating Makefile files. You will probably not need to add file names to existing cmake files (with extension CMakeLists.txt) since the cmake script automatically collects all files with cpp extension. If you need to modify the CMake infrastructure, please see our CMake Details.

In case you are curious about cmake, here's a short introduction.

Compiler

We support both GCC and the Apple-specific variant of Clang compilers, which is part of the GNU compiler collection. Here's more information on the status of C++17 support in GCC. Though compiler flags should be set using our CMake variables (see Development Builds), here are some common flags that you should be familiar with:

  • -std=c++17: Enables C++17 support
  • -g: Produces debugging information in the OS's native format.
  • -ggdb: Produces debugging information specifically intended for gdb.
  • -O0: Optimize flag that reduces compilation time and makes debugging more reliable.
  • -O3: Increases both the compilation time and the performance of the generated code. Use this when running benchmarks.
  • -Wall: Generate helpful warnings. Do not ignore them! In fact, force yourself to deal with warnings by handling them as errors with the -Werror compiler flag.

Development Environment

We recommend you use an IDE for this project. A good one to use would be CLion. See our CLion page for details.

Debugging Tools

You should use a debugger to find any bugs where possible. Your IDE should come equipped with one.

If your program behaves in "mysterious" (a.k.a. "non-deterministic" ways), valgrind is your friend. It can automatically detect many memory management and threading bugs. In particular, its memcheck tool is useful for finding illegal accesses to memory, uninitialized reads and much more. Check out this blog post on the interaction between GDB and Valgrind.

It is a good habit to make sure that the code passes all the memchecked unit tests every time you commit a set of changes with git.

Unit Testing

Unit tests are critical for ensuring the correct functionality of your modules and reduce time spent on debugging. It can help prevent regressions. We use googletest, a nice unit-testing framework for C++ projects.

You should write unit test cases for each class/algorithm that you have added or modified. See the testing section for detail. Try to come up with test cases that make sure that the module exhibits the desired behavior. Some developers even suggest writing the unit tests before implementing the code. Make sure that you include corner cases, and try to find off-by-one errors.

Profilers

Profilers help better understand the performance of Terrier and the environment in which it is running. We suggest perf and strace tools for profiling.

Terminal Multiplexer

A terminal multiplexer lets you switch easily between several programs in one terminal. This is particularly useful if you are benchmarking or developing on a remote server. We recommend the tmux tool. Here's a short tutorial, and here's a nice cheat sheet.