Skip to content
nathanmarz edited this page Oct 13, 2011 · 24 revisions

storm-deploy makes it dead-simple to launch Storm clusters on AWS. It is built on top of jclouds and pallet. After you follow the instructions in this tutorial, you will be able to provision, configure, and install a fully functional Storm cluster with just one command:

lein run :deploy --start --name mycluster --release 0.5.3

You can then stop a cluster like this:

lein run :deploy --stop --name mycluster

The deploy also installs Ganglia which provides fine-grained metrics about resource usage on the cluster.

Setup instructions

  1. Install leiningen. All you have to do is download this script, place it on your PATH, and make it executable.

  2. Clone storm-deploy using git

  3. Run lein deps

  4. Create a ~/.pallet/config.clj file that looks like the following (and fill in the blanks). This provides the deploy with the credentials necessary to launch and configure instances on AWS.

(defpallet
  :services
  {
   :default {
             :blobstore-provider "aws-s3"
             :provider "aws-ec2"
             :environment {:user {:username "storm"
                                  :private-key-path "$YOUR_PRIVATE_KEY_PATH$"
                                  :public-key-path "$YOUR_PUBLIC_KEY_PATH$"}
                           :aws-user-id "$YOUR_USER_ID$"}
             :identity "$YOUR_AWS_ACCESS_KEY$"
             :credential "$YOUR_AWD_ACCESS_KEY_SECRET$"
             :jclouds.regions "$YOUR_AWS_REGION$"
             }
    })

The deploy needs:

A. Public and private key paths for setting up ssh on the nodes. The public key path must be the private key path + ".pub" (this seems to be a bug in pallet).

B. AWS user id: You can find this on your account management page. It's a numeric number with hyphens in it. Take out the hyphens when you put it in the config.

C. Identity: Your AWS access key

D. Credential: Your AWS access key secret

  1. Configure your cluster by editing conf/clusters.yaml. You can change the number of zookeeper nodes or supervisor nodes by editing zookeeper.count or supervisor.count, respectively. You can launch spot instances for supervisor nodes by setting supervisor.spot.price. The other properties should be self-explanatory.

  2. (optional) Place any custom configurations for your Storm cluster by editing conf/storm.yaml. For example, you may change timeouts, register custom serializations, or put in other configurations you want available to your topologies.

Launching clusters

Run this command:

lein run :deploy --start --name mycluster --release {release version}

The --name parameter names your cluster so that you can attach to it or stop it later. If you omit --name, it will default to "dev". The --release parameter indicates which release of Storm to install. If you omit --release, it will install Storm from the master branch. It's highly recommended that you install a specific release.

The deploy sets up Zookeeper, sets up Nimbus, launches the Storm UI on port 8080 on Nimbus, sets up the Supervisors, sets configurations appropriately, sets the appropriate permissions for the security groups, and attaches your machine to the cluster (see below for more information on attaching).

Stopping clusters

Simply run:

lein run :deploy --stop --name mycluster

This will shut down Nimbus, the Supervisors, and the Zookeeper nodes.

Attaching to a cluster

Attaching to a cluster configures your storm client to talk to that particular cluster as well as giving your computer authorization to view the Storm UI. The storm client is used to start and stop topologies and is described here.

To attach to a cluster, run the following command:

lein run :deploy --attach --name mycluster

Attaching does the following:

  1. Writes the location of Nimbus in ~/.storm/storm.yaml so that the storm client knows which cluster to talk to
  2. Authorizes your computer to access the Nimbus daemon's Thrift port (which is used for submitting topologies)
  3. Authorizes your computer to access the Storm UI on port 8080 on Nimbus
  4. Authorizes your computer to access Ganglia on port 80 on Nimbus

Getting IPs of cluster nodes

To get the IP addresses of the cluster nodes, run the following:

lein run :deploy --ips --name mycluster

Ganglia

You can access Ganglia by navigating to the following address on your web browser:

http://{nimbus ip}/ganglia/index.php

Clone this wiki locally