mesos-systemd

Adobe Platform scripts to bootstrap a CoreOS cluster & run Mesos/Marathon/Chronos/Zookeeper-Exhibitor.

Provides node-level services as Fleet Units for every machine in the cluster.

Most services (logging, metrics, monitoring) run on all nodes, some only run on specific tiers based on the metadata that is injected into Fleet.

The aim of this setup is to move instance provisioning steps into the CoreOS machine level, automated via fleetctl/systemctl. Almost all of our systemd units utilize docker to run our services. Consequently, we're able to use the vanilla CoreOS EC2 AMI (i.e.: we don't bake AMIs at all). That being said, we have methods in this repo that also deal with sensitive data/secrets to configure various services (more below).

DISCLAIMER:

This repository may reference private repositories or scripts. Most should be replaceable with your own, but either way - proceed with caution as this project is highly experimental and certain nuances may not be well documented. If you want to use this repo, you may have to prune the code a bit and edit/delete certain files.

Concepts

The purpose of this repository is to house all setup scripts and systemd/fleetd units in a central location, separate of our infrastructure provisioning scripts (cloudformation).

All setup behavior is defined in the init script.

Assumptions:

Your infrastructure has 3 tiers: control, proxy, worker
ALL nodes run a bootstrap.service, whatever that may be.
Some of the scripts require /etc/environment to contain certain variables (usually cloudformation parameters such as route53 entries)
S3 buckets are set correctly and all required credential files (SSH keys, datadog & sumologic credentials) are properly provided to init & can be downloaded using behance/docker-aws-s3-downloader

`init` bootstrap

Our bootstrap.service just clones this repo and runs the init script.

From there, it does a couple of things:

ensure that any credentials/secure files are downloaded from S3 (to allow docker & git to pull private dependencies)
configure SSH configs to allow github.com access
copy .dockercfg into /root # TODO: refactor process as this is a hack
runs ALL scripts in v2/setup
- these scripts will always be run with sudo (i.e.: as root)
- set things up like create motds, aliases, dropins for various services
starts up tier-specific template units that are specified by the running machines' IP (provided by CoreOS / cloudinit)
- these are started via fleet, event though they are NOT global units and run on specific machines
- rationale for this is to give us granular control over certain units, such as mesos-slaves. It allows us to control individual nodes, or perform rolling actions (such as deploys) while retaining visibility into the cluster as a whole.
submits and starts generic fleet units

Services

Global Services (run on ALL nodes in ALL tiers)

Monitoring

Datadog
Sysdig
Sumologic

Util/Automated Maintenance

Docker Logrotate (based on michaloo/logrotate)
Docker Image/Container Cleanup
AWS EC2 Container Registry (ECR) login (Worker Nodes only)

MISC

SSHD mask
- bug in CoreOS
- proposed changes

Control Tier Nodes:

Mesos Master
Marathon
Exhibitor (for Zookeeper)
Chronos
Flight Director - private Marathon deployment wrapper/manager (stay tuned!)
HUD - private UI shim for flight-director (stay tuned!)

Proxy Tier Nodes:

CAPCOM - private Container-Proxy Manager (stay tuned!)
Heatshield Proxy (our version of nginx) or HAProxy

Worker Tier Nodes:

Mesos Slave

Key/Secret Management & Configuration

All secrets & key management is a bit adhoc. Most of the setup scripts, which house the logic for setting up the data for then fleet units to use, require a few things to download secrets & keys:

the $CONTROL_TIER_S3SECURE_BUCKET environment variable, written into /etc/environment by cloudformation
behance/docker-aws-s3-downloader container to download files
IAM roles to access $CONTROL_TIER_S3SECURE_BUCKET

Secrets make it onto the nodes in the form of flat text files that live within $CONTROL_TIER_S3SECURE_BUCKET. The setup files individually know which file(s) they need to download & how to read, set or use the data for their corresponding units. So for example, the datadog unit requires an etcd key, /ddapikey. Knowing this, we have a datadog setup script which downloads a .datadog file from $CONTROL_TIER_S3SECURE_BUCKET, expects it to be in a certain format, and sets the etcd key.

Files in S3

We are planning to deprecate the following in favor other solutions (DynamoDB + KMS?).

Services, dotfiles, dotfile formats

Service	File	Format
Datadog	`.datadog`	Just the key. Nothing else.
Sysdig	`.sysdig`	Just the key. Nothing else.
Sumologic	`.sumologic`	`ID=YOURID` `SECRET=YOURSECRET`
Flight Director	`.flight-director`	`/FD/GITHUB_CLIENT_ID (YOUR GITHUB APP ID)` `/FD/GITHUB_CLIENT_SECRET (YOUR GITHUB APP SECRET)` `/FD/GITHUB_ALLOWED_TEAMS org/team`
HUD	`.hud`	`/HUD/client-id (GITHUB_APP_ID can == value in .flight-director)` `/HUD/client-secret (GITHUB_APP_SECRET can == value in .flight-director)`
Marathon	`.marathon`	`/marathon/username a-username` `/marathon/password a-password`
AWS ECR	`.ecr`	`/ECR/region (ECR AWS Region, ex. us-east-1)` `/ECR/registry-account (ECR AWS Account, ex. 012345678901)`

MISC

.dockercfg to download private containers
id_rsa to clone any private repositories

Nothing special needs to be done for these two just as long as the cloudformation templates sets the following in /etc/environment

$SECURE_FILES=.dockercfg:id_rsa,0600,.ssh/id_rsa

The format of this environment variable just needs to conform to behance/docker-aws-s3-downloader

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

mesos-systemd

DISCLAIMER:

Concepts

`init` bootstrap

Services

Global Services (run on ALL nodes in ALL tiers)

Monitoring

Util/Automated Maintenance

MISC

Control Tier Nodes:

Proxy Tier Nodes:

Worker Tier Nodes:

Key/Secret Management & Configuration

Files in S3

Services, dotfiles, dotfile formats

MISC

Files

README.md

Latest commit

History

README.md

File metadata and controls

mesos-systemd

DISCLAIMER:

Concepts

init bootstrap

Services

Global Services (run on ALL nodes in ALL tiers)

Monitoring

Util/Automated Maintenance

MISC

Control Tier Nodes:

Proxy Tier Nodes:

Worker Tier Nodes:

Key/Secret Management & Configuration

Files in S3

Services, dotfiles, dotfile formats

MISC

`init` bootstrap