Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect user_data failure #44

Open
2 of 3 tasks
lawliet89 opened this issue Apr 25, 2018 · 0 comments
Open
2 of 3 tasks

Detect user_data failure #44

lawliet89 opened this issue Apr 25, 2018 · 0 comments
Labels
D-Medium Difficulty Medium enhancement New feature or request P-High Priority High

Comments

@lawliet89
Copy link
Collaborator

lawliet89 commented Apr 25, 2018

We run pretty elaborate scripts in the user_data portions of the EC2 instances.

We need some way to detect if these scripts have failed.

Probabilities:

Idea:

  • Define the existence of a user_data completion marker file as a service health check in Consul
  • Forward logs via td-agent from User_data

TODOs:

@lawliet89 lawliet89 added the enhancement New feature or request label Apr 25, 2018
lawliet89 added a commit that referenced this issue Aug 2, 2018
Currently, Fluentd and Prometheus/whatever time series DB can
potentially be run on Nomad. If the Nomad cluster is unhealthy
or the jobs are not running properly, new nodes will never be
able to bootstrap properly. We rearrange the order of launching
so that these nodes can still bootstrap their respective services.

On the downside, this means that we might not even know that the nodes
are not sending their logs.

Perhaps #44 will help?
@lawliet89 lawliet89 added P-High Priority High D-Medium Difficulty Medium labels Aug 2, 2018
lawliet89 added a commit that referenced this issue Aug 2, 2018
Currently, Fluentd and Prometheus/whatever time series DB can
potentially be run on Nomad. If the Nomad cluster is unhealthy
or the jobs are not running properly, new nodes will never be
able to bootstrap properly. We rearrange the order of launching
so that these nodes can still bootstrap their respective services.

On the downside, this means that we might not even know that the nodes
are not sending their logs.

Perhaps #44 will help?
qbiqing pushed a commit that referenced this issue Mar 10, 2020
* Basic Policies

* Add some missing policies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D-Medium Difficulty Medium enhancement New feature or request P-High Priority High
Projects
None yet
Development

No branches or pull requests

1 participant