- The purpose of this mini project was to better familiarize myself with how containers work under the hood
- We will be running containers both by utilizing Linux commands, and running code in
Go
that utilizes these native Linux features
- I am running these commands in a virtual machine running Ubuntu 20.04 using multipass
# Launch the VM
multipass launch \
--name="rootless-container" \
--cpus="2" \
--mem="2048"
# Create a directory for the Alpine root filesystem
mkdir alpinefs
cd alpinefs
# Download an Alpine root filesystem
curl https://dl-cdn.alpinelinux.org/alpine/v3.15/releases/x86_64/alpine-minirootfs-3.15.0-x86_64.tar.gz \
--output rootfs.tar.gz
# Extract the files
tar xzf rootfs.tar.gz && rm rootfs.tar.gz
# Chroot changes the root directory for the currently running process
sudo chroot /home/ubuntu/alpinefs /bin/sh
# List directory contents
ls
# Output
bin etc lib mnt proc run srv tmp var dev home media opt root sbin sys usr
- This is a good start, but right now we are running our container as the host's root
- We also cannot see currently running processes in the container
# In the container
touch file
ls -l | grep file
# Output
-rw-r--r-- 1 root root 0 Dec 30 23:23 file
# In the VM user namespace
ls -l /home/ubuntu/alpinefs | grep file
# Output
-rw-r--r-- 1 root root 0 Dec 30 15:23 file
# In the container
ps aux
# Output
PID USER TIME COMMAND
- What want to create the illusion of running as the root user in the container, not actually run as the root user
- We also want to be able to see the processes running inside the container
# Mount the /proc folder in the Alpine filesystem
mount -t proc proc /proc
# As the VM user, run this sleep process
sleep 1500
# In the container, find the process running
ps aux | grep "sleep 1500"
# Output
9998 1000 0:00 sleep 1500
10000 root 0:00 grep sleep 1500
- So now we can see all processes running on the host within the container, and can even kill them since we are running
chroot
as the host's root
# Kill the sleeping process from within the container
kill -KILL $(ps aux | grep "sleep 1500" | awk '{print $1}' | head -n 1)
- Still not what we want
- We need to utilizes kernal namespaces to isolate our container
- We can do this with the
unshare
command - Exit the currently running container and start the next set of commands
# Exit the container
exit
# Mount the Alpine proc directory
sudo mount --types proc proc $(pwd)/alpinefs/proc
# Create a PID namespace and the same chroot shell as before
sudo unshare \
--pid \
--fork \
--mount-proc="/home/ubuntu/alpinefs/proc" \
chroot /home/ubuntu/alpinefs /bin/sh
# See only the processes running in this newly create PID namespace
ps aux
# Output
PID USER TIME COMMAND
1 root 0:00 /bin/sh
5 root 0:00 ps aux
- In Docker containers, we can mount volumes from the host to the container
- Let's mount a readonly directory named
config
to our container
# On the host, create a config directory with some files
mkdir configs
touch configs/config1
touch configs/config2
# Create a container-configs directory in the Alpine filesystem
mkdir /home/ubuntu/alpinefs/container-configs
# Create a bind mount for the configs directory to the container
sudo mount \
--bind \
--read-only /home/ubuntu/configs /home/ubuntu/alpinefs/container-configs
# Run the container
sudo unshare \
--pid \
--fork \
--mount-proc="/home/ubuntu/alpinefs/proc" \
chroot /home/ubuntu/alpinefs /bin/sh
# Check permissions
ls -l container-configs/
# Output
-rw-rw-r-- 1 1000 1000 0 Dec 30 23:58 config1
-rw-rw-r-- 1 1000 1000 0 Dec 30 23:58 config2
# To unmount, run the command on the host
sudo umount /home/ubuntu/alpinefs/container-configs
- We still need to stop running the container as root
- We can run the
unshare
as a non-root user with the--user
flag
# From the host
unshare --user /bin/sh
# See the ID of the user in the container
id
# Output
uid=65534(nobody) gid=65534(nogroup) groups=65534(nogroup)
# From the container
ps -f
# Output
UID PID PPID C STIME TTY TIME CMD
nobody 8850 8849 0 13:21 pts/1 00:00:00 bash
nobody 10192 8850 0 16:17 pts/1 00:00:00 /bin/sh
nobody 10197 10192 0 16:18 pts/1 00:00:00 ps -f
# From the host
ps -t 1 -f
UID PID PPID C STIME TTY TIME CMD
ubuntu 8850 8849 0 13:21 pts/1 00:00:00 bash
ubuntu 10192 8850 0 16:17 pts/1 00:00:00 /bin/sh
- So even though we are running the container as
nobody
, the host user still identifies its processes as its own - We can create the illusion of the container running as a root user by mapping the host's user id to the root id in the container
# Exit the container
exit
# Enter the container
unshare --map-root-user /bin/sh
# See the ID of the user in the container
id
# Output
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)
# From the container
ps -f
# Output
UID PID PPID C STIME TTY TIME CMD
root 8850 8849 0 13:21 pts/1 00:00:00 bash
root 10200 8850 0 16:21 pts/1 00:00:00 /bin/sh
root 10203 10200 0 16:21 pts/1 00:00:00 ps -f
# From the host
ps -t 1 -f
UID PID PPID C STIME TTY TIME CMD
ubuntu 8850 8849 0 13:21 pts/1 00:00:00 bash
ubuntu 10200 8850 0 16:21 pts/1 00:00:00 /bin/sh
- We now appear to be running as root in the container, but the host still sees its processes as its own
- From within this new user namespace, we can run
unshare
commands as a non-root user, or rather as an illusion of the root user
# Create new PID and UTS namespaces
unshare --pid --fork --uts --mount-proc="/home/ubuntu/alpinefs/proc" chroot /home/ubuntu/alpinefs /bin/sh
# Change the hostname in the container
hostname container
# Check hostname on host and see that it has not changed as it has in the container
hostname
# Output
rootless-container
- The follow-along video for the code in
main.go
is linked at the bottom - We will be running shell process in a container using the program as we just have with native Linux commands
- You will need to install Go first
# From the host
go run main.go run /bin/sh
# Output
running [/bin/sh] as user 1000 and in process 10275
running [/bin/sh] as user 0 and in process 1
- The program essentially does the same things as above
- It includes several namespace flags that we wish to use for our container in the command built in the
run
function - The flag to make note of is
syscall.CLONE_NEWUSER
, since we now know that this will not require higher permissions from the host's user - It also maps the host user and group IDs to the root user and group ID of the container as we did previously
- When the command is executed the first time, it the program within it self by executing
/proc/self/exe
with the command ofchild
instead ofrun
- When the program runs that second time, it is doing so within the newly create namespaces, where the root directory is changed to the Alpine filesystem, and
proc
folder in the Alpine filesystem is mounted
# From the container
ps aux
# Output
PID USER TIME COMMAND
1 root 0:00 /proc/self/exe child /bin/sh
6 root 0:00 /bin/sh
8 root 0:00 ps aux
# From the host
ps -t 1 -f
# Output
UID PID PPID C STIME TTY TIME CMD
ubuntu 8850 8849 0 13:21 pts/1 00:00:00 bash
ubuntu 10439 8850 0 16:46 pts/1 00:00:00 go run main.go run /bin/sh
ubuntu 10475 10439 0 16:46 pts/1 00:00:00 /tmp/go-build2036034500/b001/exe/
ubuntu 10479 10475 0 16:46 pts/1 00:00:00 /proc/self/exe child /bin/sh
ubuntu 10484 10479 0 16:46 pts/1 00:00:00 /bin/sh
- Just as before, the host will see these processes as its own, but the container has the illusion of running as root