Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Before You Start

Tianyu Li edited this page Jul 11, 2018 · 4 revisions

Prologue

This page contains information that is intended to help you start on this project. It covers a wide set of topics related to software development, like development environment, debugging, version control etc. We are only attempting to provide biased suggestions that we think are invaluable for low-level software development. Feel free to deviate from them in case you are comfortable with other tools that solve similar problems.

Operating System

We prefer the Linux OS for development and testing. You can use any other OS for development as long as your modified source code can be built and run on Linux.

In case you are a Mac OS or Windows user, we encourage you to consider using Virtual Box and Vagrant. Virtual Box is a hypervisor for x86 computers, and Vagrant is a tool for sharing easy to configure virtual development environments. More information is available here.

Language

We are developing Peloton in C++. In particular, we are following the C++11 standard. C++ provides a lot of leeway in DBMS development compared to other high-level languages. For instance, it supports both manual and automated memory management, varied styles of programming, stronger type checking, different kinds of polymorphism etc.

Here's a list of useful references :

  • CPP Reference is an online reference of the powerful Standard Template Library (STL).
  • C++ FAQ covers a lots of topics.

Here's a list of C++11 features that you might want to make use of:

  • auto type inference
  • Range-based for loops
  • Smart pointers, in particular unique_ptr.
  • STL data structures, such as unordered_set, unordered_map, etc.
  • Threads, deleted member functions, lambdas, etc.

Comments, Formatting, and Libraries

Please comment your code. Comment all the class definitions, non-trivial member functions and variables, and all the steps in your algorithms. Here's an example. We use doxygen style comments and will reject commits that don't fill out comments.

We generally follow the Google C++ style guide. As they mention in that guide, these rules exist to keep the code base manageable while still allowing coders to use C++ language features productively. Make sure that you follow the naming rules. For instance, use class UpperCaseCamelCase for type names, int lower_case_with_underscores for variable/method/function names.

Please refrain from using any libraries other than the STL (and googletest for unit testing) without contacting us.

Directory Structure

Organize source code files into relevant folders based on their functionality. Separate binary files from source files, and production code from testing code.

In general, the src directory contains all the production code, the test directory contains all the testing code, and the build directory contains all the built binary files. Within the src directory, src/storage contains files related to our storage engine, while src/index contains files related to the indexes, etc..

Source Code Management

We exclusively use git for source code management, and github for collaboration. Here's a simple guide for using it. If you have never used a tool like git, I am sure that you will soon wonder how you managed to live without it. In case you want to learn more, here's a free book.

Please update or add a .gitignore file to exclude unwanted files (e.g. the build directory with binary files, backup/temporary files of your editor/IDE, large files containing test data, etc.) from the repository.

Build System

We use cmake for builds. In particular, we use cmake for automatically generating Makefile files. You will probably not need to add file names to existing cmake files (with extension CMakeLists.txt) since the cmake script automatically collects all files with cpp extension.

In case you are curious about cmake, here's a short introduction.

Compiler

We use the g++ compiler, which is part of the GNU compiler collection. Here's more information on the status of C++17 support in GCC. Here's a list of useful compiler flags:

  • -std=c++17: Enables C++11 support
  • -g: Produces debugging information in the OS's native format.
  • -ggdb: Produces debugging information specifically intended for gdb.
  • -O0: Optimize option that reduces compilation time and makes debugging more reliable.
  • -O3: Increases both the compilation time and the performance of the generated code. Use this when running benchmarks.
  • -Wall: Generate helpful warnings. Do not ignore them! In fact, force yourself to deal with warnings by handling them as errors with the -Werror compiler flag.

Development Environment

We recommend you use an IDE for this project. A good one to use would be CLion. See the CLion page for details.

Debugging Tools

You should use a debugger to find any bugs where possible. Your IDE should come equipped with one. Here's some information if you insist on rolling your own.

If your program behaves in "mysterious" (a.k.a. "non-deterministic" ways), valgrind is your friend. It can automatically detect many memory management and threading bugs. In particular, its memcheck tool is useful for finding illegal accesses to memory, uninitialized reads and much more. Check out this blog post on the interaction between GDB and Valgrind.

It is a good habit to make sure that the code passes all the memchecked unit tests every time you commit a set of changes with git.

Unit Testing

Unit tests are critical for ensuring the correct functionality of your modules and reduce time spent on debugging. It can help prevent regressions. We use googletest, a nice unit-testing framework for C++ projects.

You should write unit test cases for each class/algorithm that you have added or modified. See the testing section for detail. Try to come up with test cases that make sure that the module exhibits the desired behavior. Some developers even suggest writing the unit tests before implementing the code. Make sure that you include corner cases, and try to find off-by-one errors.

gcov is a code coverage analysis tool that tells us how much of the code base is covered by the unit tests. We have already integrated it in our build system.

Profilers

Profilers help better understand the performance of Peloton and the environment in which it is running. We suggest perf and strace tools for profiling. Here's some information on profiling Peloton using perf.

Terminal Multiplexer

A terminal multiplexer lets you switch easily between several programs in one terminal. This is particularly useful if you are benchmarking or developing on a remote server. We recommend the tmux tool. Here's a short tutorial, and here's a nice cheat sheet.