Detailed explanation can be found in this post
To use the template, please install the following.
- git
- Github account
- Terraform
- AWS account
- AWS CLI installed and configured
- Docker with at least 4GB of RAM and Docker Compose v1.27.0 or later
You can create your GitHub repository based on this template by clicking on the `Use this template button in the data_engineering_project_template repository. Clone your repository and replace content in the following files
- CODEOWNERS: In this file change the user id from
@josephmachado
to your Github user id. - cd.yml: In this file change the
data_engineering_project_template
part of theTARGET
parameter to your repository name. - variable.tf: In this file change the default values for
alert_email_id
andrepo_url
variables with your email and github repository url respectively.
Run the following commands in your project directory.
# Local run & test
make up # start the docker containers on your computer & runs migrations under ./migrations
make ci # Runs auto formatting, lint checks, & all the test files under ./tests
# Create AWS services with Terraform
make tf-init # Only needed on your first terraform run (or if you add new providers)
make infra-up # type in yes after verifying the changes TF will make
# Wait until the EC2 instance is initialized, you can check this via your AWS UI
# See "Status Check" on the EC2 console, it should be "2/2 checks passed" before proceeding
# Wait another 5 mins, Airflow takes a while to start up
make cloud-airflow # this command will forward Airflow port from EC2 to your machine and opens it in the browser
# the user name and password are both airflow
make cloud-metabase # this command will forward Metabase port from EC2 to your machine and opens it in the browser
# use https://github.com/josephmachado/data_engineering_project_template/blob/main/env file to connect to the warehouse from metabase
Database migrations can be created as shown below.
make db-migration # enter a description, e.g. create some schema
# make your changes to the newly created file under ./migrations
make warehouse-migration # to run the new migration on your warehouse
For the continuous delivery to work, set up the infrastructure with terraform, & defined the following repository secrets. You can set up the repository secrets by going to Settings > Secrets > Actions > New repository secret
.
SERVER_SSH_KEY
: We can get this by runningterraform -chdir=./terraform output -raw private_key
in the project directory and paste the entire content in a new Action secret called SERVER_SSH_KEY.REMOTE_HOST
: Get this by runningterraform -chdir=./terraform output -raw ec2_public_dns
in the project directory.REMOTE_USER
: The value for this is ubuntu.
After you are done, make sure to destroy your cloud infrastructure.
make down # Stop docker containers on your computer
make infra-down # type in yes after verifying the changes TF will make