Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy OTP to AWS load balancer and manage OTP servers in separate collection (and misc other fixes) #225

Merged
merged 50 commits into from
Oct 9, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
ac0b00c
WIP spin up EC2 (no user data)
landonreed Aug 7, 2018
9c5e92c
Merge branch 'remove-r5' into deploy-to-ec2
landonreed Sep 18, 2018
d34b96a
Merge branch 'dev' into deploy-to-ec2
landonreed Nov 6, 2018
560c4c6
refactor(snapshot): remove legacy MapDB-based snapshot jobs
landonreed Nov 12, 2018
dbbd130
fix(delete): delete SQL namespace when feed version/snapshot deleted
landonreed Nov 12, 2018
c01eadc
feature(deploy-ec2): deployment enhancements for load balancers
landonreed Nov 12, 2018
f777b02
fix(user-mgmt): better error handling when Auth0 cannot update/create…
landonreed Nov 12, 2018
8898376
fix: move toGtfsDate from deleted class to FeedTx
landonreed Nov 12, 2018
fa5ce83
refactor: fix whitespace
landonreed Nov 12, 2018
e0a1eb6
refactor: add missing aws pom entry
landonreed Nov 12, 2018
61c04d2
build(pom): update gtfs-lib dependency
landonreed Nov 12, 2018
7231a95
feature(server-mgmt): manage deployment servers at the application level
landonreed Nov 15, 2018
b372303
refactor(server-job): attach just the project ID to the merge feeds job
landonreed Nov 29, 2018
7c92c1d
Merge branch 'dev' into deploy-to-ec2
landonreed Nov 30, 2018
aa0553b
refactor(deploy): shuffle deploy job code for clarity
landonreed Nov 30, 2018
c52d6f0
Merge pull request #133 from ibi-group/dev
landonreed Aug 7, 2019
56f7642
Merge branch 'dev' into deploy-to-ec2
landonreed Aug 7, 2019
6ae2055
refactor: fix issues resulting from merge
landonreed Aug 7, 2019
c33c290
refactor(deployment): tweak user script and update default config
landonreed Aug 8, 2019
05ec4df
refactor(deployment): improve validation of server fields
landonreed Aug 9, 2019
b6d7363
refactor(deployments): modify OtpServer fields and refactor server cr…
landonreed Aug 9, 2019
7a020c8
refactor: remove unused import
landonreed Aug 9, 2019
c753a31
refactor(ServerController): add comment about checking S3 permissions
landonreed Aug 13, 2019
b793727
refactor(ServerController): add missing exceptions to logMessageAndHalt
landonreed Aug 14, 2019
a3ed73c
refactor(deploy): fix check for s3 graph object
landonreed Aug 20, 2019
a07935a
refactor(deploy): revert to default instance type if none specified
landonreed Aug 20, 2019
a177e79
refactor(deploy): make instance profile arn optional
landonreed Aug 20, 2019
2507ab5
refactor(deploy): use set method rather than with for instance profile
landonreed Aug 22, 2019
e1fe1a3
Merge branch 'dev' into deploy-to-ec2
landonreed Sep 3, 2019
4a1ef29
refactor(deploy): move ec2 config into OtpServer
landonreed Sep 9, 2019
9b957dd
refactor(deploy): tweak deployJob for NPE fix and fix server delete
landonreed Sep 10, 2019
bf0f1bc
ci(config): update server.yml.tmp for e2e
landonreed Sep 10, 2019
ade0b40
refactor(EC2InstanceSummary): add empty constructor for serialization
landonreed Sep 10, 2019
95a2333
Merge branch 'dev' into deploy-to-ec2
landonreed Sep 12, 2019
c273350
test(.gitignore): don't ignore test config
landonreed Sep 12, 2019
3493570
test(mtc): fix broken MTC feed merge test with new test config
landonreed Sep 12, 2019
4992b32
refactor(ServerController): isolate jackson parse to utility method
landonreed Sep 12, 2019
3441fa2
refactor(ServerController): surround validation method calls in try/c…
landonreed Sep 12, 2019
a36a7d9
refactor(deploy-to-ec2): address PR comments
landonreed Sep 20, 2019
18bd9d3
refactor(deploy-to-ec2): add json property latest; add server ID to s…
landonreed Sep 20, 2019
c28c215
Merge branch 'dev' into deploy-to-ec2
landonreed Sep 20, 2019
41c21a8
refactor(deploy): fix check for S3 jar
landonreed Sep 20, 2019
7e1528b
refactor(deploy-to-ec2): address PR comments
landonreed Sep 24, 2019
f34affb
refactor(deploy-to-ec2): surround s3 checks in try/catch
landonreed Sep 24, 2019
4cc9b68
refactor(deploy-to-ec2): actually skip termination request
landonreed Sep 24, 2019
fb44a61
refactor(deploy): fix duration calc
landonreed Sep 30, 2019
101b7f9
refactor(deploy): use onboard nginx to signal ec2 deploy status
landonreed Oct 1, 2019
523801d
refactor(deploy): bump default otp version to 1.4
landonreed Oct 3, 2019
c399189
refactor(deploy): add terminate EC2 instance HTTP endpoint
landonreed Oct 8, 2019
240a6e0
refactor(deploy): refine terminate instances endpoint and check for g…
landonreed Oct 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions configurations/default/server.yml.tmp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ modules:
ec2:
enabled: false
default_ami: ami-your-ami-id
# Note: using a cloudfront URL for these download URLs will greatly
# increase download/deploy speed.
otp_download_url: https://optional-otp-repo.com
r5_download_url: https://optional-r5-repo.com
user_admin:
enabled: true
gtfsapi:
Expand Down
66 changes: 47 additions & 19 deletions src/main/java/com/conveyal/datatools/manager/jobs/DeployJob.java
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,18 @@ public class DeployJob extends MonitorableJob {
private static final String AMI_CONFIG_PATH = "modules.deployment.ec2.default_ami";
private static final String DEFAULT_AMI_ID = DataManager.getConfigPropertyAsText(AMI_CONFIG_PATH);
private static final String OTP_GRAPH_FILENAME = "Graph.obj";
public static final String BUNDLE_DOWNLOAD_COMPLETE_FILE = "BUNDLE_DOWNLOAD_COMPLETE";
// Use txt at the end of these filenames so that these can easily be viewed in a web browser.
public static final String BUNDLE_DOWNLOAD_COMPLETE_FILE = "BUNDLE_DOWNLOAD_COMPLETE.txt";
public static final String GRAPH_STATUS_FILE = "GRAPH_STATUS.txt";
private static final long TEN_MINUTES_IN_MILLISECONDS = 10 * 60 * 1000;
/**
// Note: using a cloudfront URL for these download repo URLs will greatly increase download/deploy speed.
private static final String R5_REPO_URL = DataManager.hasConfigProperty("modules.deployment.r5_download_url")
? DataManager.getConfigPropertyAsText("modules.deployment.r5_download_url")
: "https://r5-builds.s3.amazonaws.com";
private static final String OTP_REPO_URL = DataManager.hasConfigProperty("modules.deployment.otp_download_url")
? DataManager.getConfigPropertyAsText("modules.deployment.otp_download_url")
: "https://opentripplanner-builds.s3.amazonaws.com";
/**
* S3 bucket to upload deployment to. If not null, uses {@link OtpServer#s3Bucket}. Otherwise, defaults to
* {@link DataManager#feedBucket}
* */
Expand Down Expand Up @@ -121,6 +130,16 @@ public String getDeploymentId () {
return deployment.id;
}

/** Increment the completed servers count (for use during ELB deployment) and update the job status. */
public void incrementCompletedServers() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drake yes

status.numServersCompleted++;
int totalServers = otpServer.ec2Info.instanceCount;
if (totalServers < 1) totalServers = 1;
int numRemaining = totalServers - status.numServersCompleted;
double newStatus = status.percentComplete + (100 - status.percentComplete) * numRemaining / totalServers;
status.update(String.format("Completed %d servers. %d remaining...", status.numServersCompleted, numRemaining), newStatus);
}

@JsonProperty
public String getServerId () {
return otpServer.id;
Expand Down Expand Up @@ -428,7 +447,7 @@ public void jobFinished () {
deployment.deployedTo = otpServer.id;
deployment.deployJobSummaries.add(0, new DeploySummary(this));
Persistence.deployments.replace(deployment.id, deployment);
long durationMinutes = TimeUnit.MILLISECONDS.toMinutes(status.duration);
long durationMinutes = TimeUnit.MILLISECONDS.toMinutes(System.currentTimeMillis() - status.startTime);
message = String.format("Deployment %s successfully deployed to %s in %s minutes.", deployment.name, otpServer.publicUrl, durationMinutes);
} else {
message = String.format("WARNING: Deployment %s failed to deploy to %s. Error: %s", deployment.name, otpServer.publicUrl, status.message);
Expand Down Expand Up @@ -478,19 +497,19 @@ private void replaceEC2Servers() {
return;
}
// Spin up remaining servers which will download the graph from S3.
int remainingServerCount = otpServer.ec2Info.instanceCount <= 0 ? 0 : otpServer.ec2Info.instanceCount - 1;
status.numServersRemaining = otpServer.ec2Info.instanceCount <= 0 ? 0 : otpServer.ec2Info.instanceCount - 1;
List<MonitorServerStatusJob> remainingServerMonitorJobs = new ArrayList<>();
List<Instance> remainingInstances = new ArrayList<>();
if (remainingServerCount > 0) {
if (status.numServersRemaining > 0) {
// Spin up remaining EC2 instances.
status.message = String.format("Spinning up remaining %d instance(s).", remainingServerCount);
remainingInstances.addAll(startEC2Instances(remainingServerCount, true));
status.message = String.format("Spinning up remaining %d instance(s).", status.numServersRemaining);
remainingInstances.addAll(startEC2Instances(status.numServersRemaining, true));
if (remainingInstances.size() == 0 || status.error) {
ServerController.terminateInstances(remainingInstances);
return;
}
// Create new thread pool to monitor server setup so that the servers are monitored in parallel.
ExecutorService service = Executors.newFixedThreadPool(remainingServerCount);
ExecutorService service = Executors.newFixedThreadPool(status.numServersRemaining);
for (Instance instance : remainingInstances) {
// Note: new instances are added
MonitorServerStatusJob monitorServerStatusJob = new MonitorServerStatusJob(owner, this, instance, true);
Expand Down Expand Up @@ -630,7 +649,7 @@ private List<Instance> startEC2Instances(int count, boolean graphAlreadyBuilt) {
for (Instance instance : instances) {
// The public IP addresses will likely be null at this point because they take a few seconds to initialize.
instanceIpAddresses.put(instance.getInstanceId(), instance.getPublicIpAddress());
String serverName = String.format("%s %s (%s) %d", deployment.r5 ? "r5" : "otp", deployment.name, dateString, serverCounter++);
String serverName = String.format("%s %s (%s) %d %s", deployment.r5 ? "r5" : "otp", deployment.name, dateString, serverCounter++, graphAlreadyBuilt ? "clone" : "builder");
LOG.info("Creating tags for new EC2 instance {}", serverName);
ec2.createTags(new CreateTagsRequest()
.withTags(new Tag("Name", serverName))
Expand Down Expand Up @@ -695,24 +714,24 @@ private String constructUserData(boolean graphAlreadyBuilt) {
jarName = deployment.r5 ? deployment.r5Version : deployment.otpVersion;
Persistence.deployments.replace(deployment.id, deployment);
}
String s3JarBucket = deployment.r5 ? "r5-builds" : "opentripplanner-builds";
// Construct URL for trip planner jar and check that it exists with a lightweight HEAD request.
String s3JarKey = jarName + ".jar";
// If jar does not exist in bucket, fail job.
String s3JarUrl = String.format("https://%s.s3.amazonaws.com/%s", s3JarBucket, s3JarKey);
String repoUrl = deployment.r5 ? R5_REPO_URL : OTP_REPO_URL;
String s3JarUrl = String.join("/", repoUrl, s3JarKey);
try {
final URL url = new URL(s3JarUrl);
HttpURLConnection huc = (HttpURLConnection) url.openConnection();
huc.setRequestMethod("HEAD");
int responseCode = huc.getResponseCode();
if (responseCode != HttpStatus.OK_200) {
statusMessage = String.format("Requested trip planner jar does not exist at s3://%s/%s", s3JarBucket, s3JarKey);
statusMessage = String.format("Requested trip planner jar does not exist at %s", s3JarUrl);
LOG.error(statusMessage);
status.fail(statusMessage);
return null;
}
} catch (IOException e) {
statusMessage = String.format("Error checking for trip planner jar: s3://%s/%s", s3JarBucket, s3JarKey);
LOG.error(statusMessage);
statusMessage = String.format("Error checking for trip planner jar: %s", s3JarUrl);
LOG.error(statusMessage, e);
status.fail(statusMessage);
return null;
}
Expand All @@ -735,6 +754,10 @@ private String constructUserData(boolean graphAlreadyBuilt) {
lines.add(String.format("rm -rf %s/*", routerDir));
// Download trip planner JAR.
lines.add(String.format("mkdir -p %s", jarDir));
// Add client static file directory for uploading deploy stage status files.
// TODO: switch to AMI that uses /usr/share/nginx/html as static file dir so we don't have to create this new dir.
lines.add("WEB_DIR=/usr/share/nginx/client");
lines.add("sudo mkdir $WEB_DIR");
lines.add(String.format("wget %s -O %s/%s.jar", s3JarUrl, jarDir, jarName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could fail and result in the rest of the items failing.

if (graphAlreadyBuilt) {
lines.add("echo 'downloading graph from s3'");
Expand All @@ -745,9 +768,8 @@ private String constructUserData(boolean graphAlreadyBuilt) {
lines.add(String.format("aws s3 --region us-east-1 cp %s /tmp/bundle.zip", getS3BundleURI()));
// Determine if bundle download was successful.
lines.add("[ -f /tmp/bundle.zip ] && BUNDLE_STATUS='SUCCESS' || BUNDLE_STATUS='FAILURE'");
// Create and upload file with bundle status to notify Data Tools that download is complete.
lines.add(String.format("echo $BUNDLE_STATUS > /tmp/%s", BUNDLE_DOWNLOAD_COMPLETE_FILE));
lines.add(String.format("aws s3 --region us-east-1 cp /tmp/%s %s", BUNDLE_DOWNLOAD_COMPLETE_FILE, joinToS3FolderURI(BUNDLE_DOWNLOAD_COMPLETE_FILE)));
// Create file with bundle status in web dir to notify Data Tools that download is complete.
lines.add(String.format("sudo echo $BUNDLE_STATUS > $WEB_DIR/%s", BUNDLE_DOWNLOAD_COMPLETE_FILE));
// Put unzipped bundle data into router directory.
lines.add(String.format("unzip /tmp/bundle.zip -d %s", routerDir));
// FIXME: Add ability to fetch custom bikeshare.xml file (CarFreeAtoZ)
Expand All @@ -756,7 +778,7 @@ private String constructUserData(boolean graphAlreadyBuilt) {
lines.add(String.format("printf \"{\\n bikeRentalFile: \"bikeshare.xml\"\\n}\" >> %s/build-config.json\"", routerDir));
}
lines.add("echo 'starting graph build'");
// Build the graph if Graph object (presumably this is the first instance to be started up).
// Build the graph.
if (deployment.r5) lines.add(String.format("sudo -H -u ubuntu java -Xmx6G -jar %s/%s.jar point --build %s", jarDir, jarName, routerDir));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're using this here instead of this method's argument, maybe this method doesn't need an argument.

else lines.add(String.format("sudo -H -u ubuntu java -jar %s/%s.jar --build %s > $BUILDLOGFILE 2>&1", jarDir, jarName, routerDir));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs fault tolerance in case graph build fails.

// Upload the build log file and graph to S3.
Expand All @@ -766,6 +788,10 @@ private String constructUserData(boolean graphAlreadyBuilt) {
lines.add(String.format("aws s3 --region us-east-1 cp %s/%s %s ", routerDir, OTP_GRAPH_FILENAME, getS3GraphURI()));
}
}
// Determine if graph build/download was successful.
lines.add(String.format("[ -f %s/%s ] && GRAPH_STATUS='SUCCESS' || GRAPH_STATUS='FAILURE'", routerDir, OTP_GRAPH_FILENAME));
// Create file with bundle status in web dir to notify Data Tools that download is complete.
lines.add(String.format("sudo echo $GRAPH_STATUS > $WEB_DIR/%s", GRAPH_STATUS_FILE));
// Get the instance's instance ID from the AWS metadata endpoint.
lines.add("instance_id=`curl http://169.254.169.254/latest/meta-data/instance-id`");
landonreed marked this conversation as resolved.
Show resolved Hide resolved
// Upload user data log associated with instance to a log file on S3.
Expand Down Expand Up @@ -833,6 +859,8 @@ public static class DeployStatus extends Status {
/** To how many servers have we successfully deployed thus far? */
public int numServersCompleted;

public int numServersRemaining;

/** How many servers are we attempting to deploy to? */
public int totalServers;

Expand Down
Loading