Example on how to mount an AWS EFS filesystem from within a managed AWS Batch job using standard Amazon ECS-Optimized AMI
Reason to do this is when >~8GB of disk is required and you don't wan't to use a custom AMI that involves messing around with EBS volumes and SSH'ing into instances (see "Creating the custom AMI for AWS Batch" here)
Good AWS forum post How much disk space comes on a managed environment? highlighting the current limitation
Managed Compute Environments currently launch the ECS Optimized AMI which includes an 8GB volume for the operating system and a 22GB volume for Docker image and metadata storage. The default Docker configuration allocates up to 10GB of this storage to each container instance. You can read more about this AMI at:
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html
For now, your best option is to use Unmanaged Compute Environments which allow you to run instances using any AMI meeting our minimum system requirements.
Downside is that EFS costs 3x more than EBS, maybe when the ECS Agent supports volume drivers this will be easier & cheaper
Mounting EFS from within running batch container is only possible with the privileged batch job definition parameter set to True
When this parameter is true, the container is given elevated privileges on the host container instance (similar to the root user). This parameter maps to Privileged in the Create a container section of the Docker Remote API and the --privileged option to docker run.
For unmanaged batch compute environments or managed using custom API, see:
- How do I increase the default 10 GiB storage limit with Docker container volumes for ECS?
- Bootstrapping Container Instances with Amazon EC2 User Data
- AWS re:Invent 2017: AWS Batch: Easy and Efficient Batch Computing on AWS (CMP323)
- AWS Batch 2018 Roadmap (from re:Invent 2017 video above)
NOTE: this bit isnt fully tested, but should be enough to give you an idea
- Ensure VPC private subnet has auto-assign public IP enabled or is behind NAT gateway or batch job will remain at RUNNABLE state
- Create ECS IAM Role for batch job with inline policy:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "1", "Effect": "Allow", "Action": [ "elasticfilesystem:DescribeMountTargets", "elasticfilesystem:DescribeFileSystems" ], "Resource": "*" } ] }
- Create managed batch compute environment with private subnet above (with
Enable user-specified AMI ID
unchecked) - Create a privileged batch job definition
aws_batch_efs
using IAM role created above - Create batch job queue
aws_batch_efs_queue
- Create ECR repository
aws_batch_efs
- Create EFS filesystem with name
batch
and enable mount target in same subnet(s) as batch compute environment - Install docker and build image:
./build.sh
- Update
AWS_ACCOUNT_ID
inpush.sh
and any other variables required and push docker image to ECR:./push.sh
- submit job either through the console or using
batch.py
(need topip install gevent boto3
if not already installed)
The following is happening here:
batch.py
submits job passing in environment variableEFS_NAME=batch
and then waits for it to complete before dumping the log of the job- AWS Batch container starts and invokes
run.sh
entrypoint run.sh
sudo executesmount_efs.sh
passing the name of the EFS filesystembatch
and mount point/mnt/efs/batch
mount_efs.sh
looks up the EFS mount target IP for the subnet the batch container is running in and mounts itrun.sh
creates a unique temporary directory below/mnt/efs/batch
based onAWS_BATCH_JOB_ID
environment variable that AWS Batch makes available to the containerrun.sh
writes and reads a filerun.sh
deletes temporary directory from EFS
$ ./batch.py submit -q aws_batch_efs_queue -d aws_batch_efs -j aws_batch_efs_1 -e EFS_NAME=batch
submitted job aws_batch_efs_1 from definition ID aws_batch_efs:1 with ID 58e88733-1d61-4912-a8bg-934249940edc
2018-02-09T00:27:27Z status is SUBMITTED
2018-02-09T00:27:37Z status is RUNNABLE
2018-02-09T00:27:47Z status is RUNNING
2018-02-09T00:27:57Z status is SUCCEEDED
* df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-202:1-263304-c6aaf921c8d8ffccd40e2ag49a1e21c1efd8cb033b89ea2f44e352acbe2a760d 9.8G 382M 8.9G 5% /
tmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/xvda1 7.8G 662M 7.1G 9% /etc/hosts
shm 64M 0 64M 0% /dev/shm
/batch/mount_efs.sh: mounted EFS batch in subnet subnet-957cd9a2 with IP 10.0.1.102 to /mnt/efs/batch
* df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-202:1-263304-c6aaf921c8d8ffccd40e2ag49a1e21c1efd8cb033b89ea2f44e352acbe2a760d 9.8G 382M 8.9G 5% /
tmpfs 3.9G 4.0K 3.9G 1% /dev
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/xvda1 7.8G 662M 7.1G 9% /etc/hosts
shm 64M 0 64M 0% /dev/shm
10.0.1.102:/ 8.0E 0 8.0E 0% /mnt/efs/batch
* ls -l /mnt/efs/batch/*
ls: cannot access /mnt/efs/batch/*: No such file or directory
* mkdir /mnt/efs/batch/58e88733-1d61-4912-a8bg-934249940edc
* ls -l /mnt/efs/batch/58e88733-1d61-4912-a8bg-934249940edc/*
ls: cannot access /mnt/efs/batch/58e88733-1d61-4912-a8bg-934249940edc/*: No such file or directory
* writing /mnt/efs/batch/58e88733-1d61-4912-a8bg-934249940edc/test_file.txt
* ls -l /mnt/efs/batch/58e88733-1d61-4912-a8bg-934249940edc/*
-rw-r--r-- 1 batch batch 28 Feb 9 00:27 /mnt/efs/batch/58e88733-1d61-4912-a8bg-934249940edc/test_file.txt
* reading /mnt/efs/batch/58e88733-1d61-4912-a8bg-934249940edc/test_file.txt
Fri Feb 9 00:27:46 UTC 2018
* removing /mnt/efs/batch/58e88733-1d61-4912-a8bg-934249940edc
* ls -l /mnt/efs/batch/*
ls: cannot access /mnt/efs/batch/*: No such file or directory
COMPLETE
Some other functionality of batch.py
$ export BATCH_QUEUE=aws_batch_efs_queue
$ ./batch.py jobs
SUBMITTED PENDING RUNNABLE STARTING RUNNING SUCCEEDED FAILED
aws_batch_efs_2
aws_batch_efs_1
aws_batch_efs_3
$ export BATCH_QUEUE=aws_batch_efs_queue
$ ./batch.py log -j aws_batch_efs_2
/batch/run.sh EFS_NAME and/or AWS_BATCH_JOB_ID environment variables not set
$ export BATCH_QUEUE=aws_batch_efs_queue
$ ./batch.py wait -j aws_batch_efs_3
2018-02-09T02:21:09Z status is SUBMITTED
2018-02-09T02:21:19Z status is RUNNABLE
$ export BATCH_QUEUE=aws_batch_efs_queue
$ export BATCH_DEFN_NAME=aws_batch_efs
$ ./batch.py submit -j aws_batch_efs_3
ERROR: job 'aws_batch_efs_3' already processing with status RUNNABLE