"Thanks to its simplicity and power, Terraform has emerged as a key player in the DevOps world. It allows you to replace the tedious, fragile, and manual parts of infrastructure management with a solid automated foundation upon which you can build all your other DevOps practices and tooling." - Yevgeniy Brikman
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
The AWS instance will be running Anaconda distribution. I like Anaconda because it is the easiest way to perform Python/R data science and machine learning on Linux, Windows, and Mac OS X. With over 15 million users worldwide, it is the industry standard for developing, testing, and training on a single machine, enabling individual data scientists to:
- Quickly download 1,500+ Python/R data science packages
- Manage libraries, dependencies, and environments with Conda
- Develop and train machine learning and deep learning models with scikit-learn, TensorFlow, and Theano
- Analyze data with scalability and performance with Dask, NumPy, pandas, and Numba
- Visualize results with Matplotlib, Bokeh, Datashader, and Holoviews
Many guides online show how to set up Jupyter Notebooks on AWS, however they're mostly ClickOps; not DevOps. They require clicking around the AWS GUI, making key-pairs, manually configuring security groups, manually configuring Jupyter config files in Vim, etc. Some of these guides have 12–15 steps. The goal of this article is to automate the process of launching Jupyter Notebooks on AWS with Terraform. This guide accomplishes exactly that. This guide will create the AWS infrastructure as code (IaC) in about 10 minutes by running a couple commands.
This guide assumes you have some basic knowledge of AWS, have an AWS account, have a shared credentials file, etc. This code also assumes you're using a pretty vanilla AWS account. i.e. default VPCs, subnets, etc. However, no Terraform knowledge is required to get up and running. If you want to learn more about Terraform, I highly recommend buying Terraform Up and Running by Yevgeniy Brikman.
This Terraform will do the following automatically:
- Creates a key-pair and puts it in your working directory.
- Creates a AWS Security Group that is pre-configured for Jupyter Notebooks.
- Creates a AWS Instance using the latest Amazon Linux 2 AMI.
- Creates a EBS volume for Anaconda Python distribution.
- Attaches the EBS volume to the instance.
- Mounts the EBS instance as /anaconda3
- Downloads Anaconda
- Installs Anaconda
- Sets the environment variable for Anaconda, python, jupyter, etc
- Configures the Jupyter Notebook config file for use with AWS.
I use HomeBrew to install Terraform.
Install it by running the following command:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
After installing HomeBrew, run brew install terraform
in the Terminal application of your choice. I highly recommend iTerm2.
Linux users should use LinuxBrew. Follow the installation instructions from that website if you are a Linux user. It's a little more cumbersome than the MacOS installation, so I'm going to leave the steps out.
After installing LinuxBrew, run brew install terraform
in the Terminal application of your choice.
Windows users; you're on your own. I don't use Windows. Find some documentation online and figure it out yourself. :)
This scrips assume you're using a shared credentials file in ~/.aws/
.
You'll need to create an IAM user with programmatic access, place the aws_access_key
and aws_secret_access_key
in ~/.aws/credentials
. I recommend also putting in a default region inside ~/.aws/config
.
For more information see:
.
├── main.tf
├── output.tf
├── script.sh
└── var.tf
This is the main Terraform file. It includes all the resources created in AWS.
This is where you can have Terraform output certain attributes after it has completed running.
This is a bash shell script that executes when the EC2 instance is created. It does some lower level Linux stuff and takes care of:
- Creating a log file for debugging.
- Updates Amazon Linux 2 packages.
- Mounts the EBS volume as
/anaconda3
. - Edits the
fstab
file inside Amazon Linux 2 to ensure the volume is mounted after a reboot. - Downloads and installs Anaconda.
- Creates and configures the Jupyter Notebook config file to make Jupyter Notebook AWS friendly.
This is where Terraform stores variables used in main.tf
.
- Navigate to this repo in your Terminal app.
- First you will have to initialize terraform by running the command
terraform init
. - Run the command
terraform plan -out=terraform.plan
. - You can see a preview of all the resources Terraform will create.
- Run the command
terraform apply "terraform.plan"
. - You'll see Terraform creating resources. It will also place the access key-pair in your working directory for use with connecting to the ec2-instance with SSH.
- After Terraform has completed creating resources it will output the connection string, which you'll use to connect with SSH.
- Wait ~10 minutes for the start up script (
script.sh
) to complete. It takes time to download and install Anaconda, especially on at2.micro
instance.
- Connect to your instance by running the following command:
ssh -i "<keyname>.pem" ec2-user@<public-dns>
. The connection string is outputted by Terraform. You will be prompted Are you sure you want to connect? So, typeyes
and press enter/return. - You'll see that you've entered your EC2 instance.
- Start up the Jupyter Notebook server by running the command
jupyter notebook
. - You'll see a URL. Copy and paste that link in your browser. Jupyter Notebook will load.
- Happy coding!
- When you're done, run the command
terraform destroy
and it will destroy all the resources created by Terraform.
Note that the Terraform state file is local
. This is not always a good idea. However, I left it as local
cause that's the easiest way to distribute working Terraform code. I suggest keeping your State file in an AWS S3 bucket. For more information, I highly recommend buying Terraform Up and Running by Yevgeniy Brikman.