This file presents all required information about generating/obtaining the data for the experiments, running them, and visualize their results with jupyter.
Our implementation was done in mutable
, for which we offer two methods on
how to install it and its preliminaries: A native, local installation and Docker
.
For the local installation, please follow the instruction found here to build mutable
. Note that you need to
build mutable with its WebAssembly-Backend.
You also need to perform the following steps:
- Set up a virtual python environment. Make sure to have Python3.10 installed!
pipenv sync --python 3.10
- (Optional) Setup a local PostgreSQL installation. PostgreSQL will be required in order to generate cardinality estimations later on. As cardinality estimations are already contained in the repository, this step is optional.
In order to use docker, please follow the following instructions. We always assume you to be in the root directory of the repository.
- Install docker and make sure your docker daemon is running.
- Create two folders used for volumes by the docker containers.
mkdir postgres paper_benchmarks
- Build/download the required images. This can take up to an hour.
docker compose build
- Start the mutable container.
docker compose up mutable -d
- (Optional) Start the postgres container. This is only required if you want to recreate the injected cardinality files.
docker compose up postgres -d
- Get an interactive shell in the
mutable
container. All following commands need to be executed from within that shell.
docker compose exec -it mutable /bin/bash
Note that the postgres_user
in docker will always be postgres
.
For all following commands, we assume you to be in the virtual python environment, i.e., you have executed
pipenv shell
and run commands from within that shell.
We first need to generate/download the required data. For that you need to:
- Download the IMDb data.
python benchmark/get_data.py job
- Create the synthetic dataset.
python benchmark/result-db-eval/synthetic/generate_synthetic_joins.py
- (Optional) Load the IMDb/synthetic data into postgres. This is only required if you want to recreate the injected cardinalities.
./benchmark/result-db-eval/job/setup_postgres.sh <postgres_user>
./benchmark/result-db-eval/synthetic/setup_postgres.sh <postgres_user>
- (Optional) Recreate the injected cardinality files used in the experiments. This may take some time. On our machine, this took an hour. Note that this step is optional, as the cardinality files are already contained in the repository.
python benchmark/result-db-eval/create_injected_cardinalities.py --user <postgres_user>
In order to execute all experiments, perform the following steps:
- Run the enumeration time experiments.
./benchmark/result-db-eval/run_evaluation_time_experiments.sh
- Run the synthetic data experiments.
./benchmark/result-db-eval/run_synthetic_experiments.sh
- Run the JOB experiments.
python benchmark/Benchmark.py --output benchmark/result-db-eval/job/recreated_results/job.csv benchmark/result-db-eval/job/
In order to visualize the results with jupyter, and generate PDFs with the results, you need to run the following commands:
- Go into the folder with the jupyter notebook.
cd benchmark/result-db-eval/
- Start the jupyter server.
- If you are using the local installation.
This should open jupyter in your browser.
jupyter notebook
- If you are using Docker.
Copy the link at the bottom containing the IP
jupyter notebook --ip=0.0.0.0 --no-browser --allow-root
127.0.0.1
and paste it into your browser.
- If you are using the local installation.
- Open
Visualization.ipynb
and configure which results you want to see. You can decide whether you want to create visualizations for the existing results, or the results that were recreated in your own run of the experiments. In case you want to see the existing results, set the parameteruse_existing_results
toTrue
, in case you want to see the recreated results, set it toFalse
. The parameter can be found at the bottom of the first jupyter cell.