-
Notifications
You must be signed in to change notification settings - Fork 504
Before You Start
This page contains information that is intended to help you start on this project. It covers a wide set of topics related to software development, like development environment, debugging, version control etc. We are only attempting to provide biased suggestions that we think are invaluable for low-level software development. Feel free to deviate from them in case you are comfortable with other tools that solve similar problems.
We prefer the Linux
OS for development and testing. You can use any other OS for development as long as your modified source code can be built and run on Linux.
In case you are a Mac OS
or Windows
user, we encourage you to consider using Virtual Box
and Vagrant
. Virtual Box
is a hypervisor for x86 computers, and Vagrant
is a tool for sharing easy to configure virtual development environments. More information is available here.
We are developing Peloton in C++
. In particular, we are following the C++11
standard. C++ provides a lot of leeway in DBMS development compared to other high-level languages. For instance, it supports both manual and automated memory management, varied styles of programming, stronger type checking, different kinds of polymorphism etc.
Here's a list of useful references :
-
CPP Reference is an online reference of the powerful
Standard Template Library
(STL). - C++ FAQ covers a lots of topics.
Here's a list of C++11 features that you might want to make use of:
-
auto
type inference - Range-based
for
loops - Smart pointers, in particular
unique_ptr
. - STL data structures, such as
unordered_set
,unordered_map
, etc. - Threads, deleted member functions, lambdas, etc.
Please comment your code. Comment all the class definitions, non-trivial member functions and variables, and all the steps in your algorithms. Here's an example. We use doxygen style comments and will reject commits that don't fill out comments.
We generally follow the Google C++ style guide. As they mention in that guide, these rules exist to keep the code base manageable while still allowing coders to use C++ language features productively.
Make sure that you follow the naming rules. For instance, use class UpperCaseCamelCase
for type names, int lower_case_with_underscores
for variable/method/function names.
Please refrain from using any libraries other than the STL
(and googletest
for unit testing) without contacting us.
Organize source code files into relevant folders based on their functionality. Separate binary files from source files, and production code from testing code.
In general, the src
directory contains all the production code, the test
directory contains all the testing code, and the build
directory contains all the built binary files. Within the src
directory, src/storage
contains files related to our storage engine, while src/index
contains files related to the indexes, etc..
We exclusively use git
for source code management, and github
for collaboration. Here's a simple guide for using it. If you have never used a tool like git, I am sure that you will soon wonder how you managed to live without it. In case you want to learn more, here's a free book.
Please update or add a .gitignore
file to exclude unwanted files (e.g. the build directory with binary files, backup/temporary files of your editor/IDE, large files containing test data, etc.) from the repository.
We use cmake
for builds. In particular, we use cmake
for automatically generating Makefile
files. You will probably not need to add file names to existing cmake files (with extension CMakeLists.txt
) since the cmake script automatically collects all files with cpp
extension.
In case you are curious about cmake
, here's a short introduction.
We use the g++
compiler, which is part of the GNU compiler collection. Here's more information on the status of C++17 support in GCC. Here's a list of useful compiler flags:
-
-std=c++17
: Enables C++11 support -
-g
: Produces debugging information in the OS's native format. -
-ggdb
: Produces debugging information specifically intended for gdb. -
-O0
: Optimize option that reduces compilation time and makes debugging more reliable. -
-O3
: Increases both the compilation time and the performance of the generated code. Use this when running benchmarks. -
-Wall
: Generate helpful warnings. Do not ignore them! In fact, force yourself to deal with warnings by handling them as errors with the-Werror
compiler flag.
We recommend you use an IDE for this project. A good one to use would be CLion. See the CLion page for details.
You should use a debugger to find any bugs where possible. Your IDE should come equipped with one. Here's some information if you insist on rolling your own.
If your program behaves in "mysterious" (a.k.a. "non-deterministic" ways), valgrind
is your friend. It can automatically detect many memory management and threading bugs. In particular, its memcheck
tool is useful for finding illegal accesses to memory, uninitialized reads and much more. Check out this blog post on the interaction between GDB and Valgrind.
It is a good habit to make sure that the code passes all the memchecked unit tests every time you commit a set of changes with git
.
Unit tests are critical for ensuring the correct functionality of your modules and reduce time spent on debugging. It can help prevent regressions. We use googletest
, a nice unit-testing framework for C++ projects.
You should write unit test cases for each class/algorithm that you have added or modified. See the testing section for detail. Try to come up with test cases that make sure that the module exhibits the desired behavior. Some developers even suggest writing the unit tests before implementing the code. Make sure that you include corner cases, and try to find off-by-one errors.
gcov
is a code coverage analysis tool that tells us how much of the code base is covered by the unit tests. We have already integrated it in our build system.
Profilers help better understand the performance of Peloton and the environment in which it is running. We suggest perf
and strace
tools for profiling. Here's some information on profiling Peloton using perf.
A terminal multiplexer lets you switch easily between several programs in one terminal. This is particularly useful if you are benchmarking or developing on a remote server. We recommend the tmux
tool. Here's a short tutorial, and here's a nice cheat sheet.
Carnegie Mellon Database Group Website