Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Alertmanager MVP #3252

Merged
merged 9 commits into from
Nov 19, 2024
Merged

Local Alertmanager MVP #3252

merged 9 commits into from
Nov 19, 2024

Conversation

elipe17
Copy link

@elipe17 elipe17 commented Oct 30, 2024

Summary of Changes

  • Added MVP implementation of Alertmanager
  • Integrated Alertmanager with SendGrid for email alerts
  • Added initial prometheus alerting rules based on available metrics
  • Added some stub work for when this is deployed
  • Updated the Logs dashboard to also display uptime/availability
  • Updated frontend to allow proxy pass to Alertmanger UI
    Pull request closes Local Alertmanager #3242

How to Test

  • Before you up everything. You need to add your email (instead of mine) and a SENDGRID_API_KEY to alertmanager.yml.
    • On line 6 in alertmanager.yml replace {{ sendgrid_api_key }} with a valid api key.
    • On line 104 in alertmanager.yml replace my emails with your email(s).
cd tdrs-backend && docker-compose up --build
  1. Open the alertmanager container logs and verify no new alerts are firing (note you could see a message like: level=debug component=dispatcher msg="Received alert" alert="Local Backend Down[7649f89][active]" initially and that it is immediately resolved after: level=debug component=dispatcher msg="Received alert" alert="Local Backend Down[7649f89][resolved]". This happens because of some tight timing tolerances. You will not get an email for this which is expected.
  2. Let everything run for a minute or two and verify alertmanager is NOT firing any new alerts.
  3. Kill postgres or web or both containers and watch alertmanager start firing alerts. After the alert(s) have fired for at least 1 minute you will receive emails for the alerts. You will not receive another email for the alert for another 5 minutes
  4. Restart the container(s) you killed and verify alertmanager marks the firing alert(s) as resolved.

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

  • Prometheus connects to Alertmanager
  • Prometheus sends alerts to Alertmanager
  • Alertmanager integrated with SendGrid
  • Alert emails are received from Alertmanager/SendGrid
  • Documentation updated indicating Alertmanager integration
  • README is updated, if necessary

Deliverable 2: Tested Code

  • Are all areas of code introduced in this PR meaningfully tested?
    • If this PR introduces backend code changes, are they meaningfully tested?
    • If this PR introduces frontend code changes, are they meaningfully tested?
  • Are code coverage minimums met?
    • Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
    • Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

  • Are backend code style checks passing on CircleCI?
  • Are frontend code style checks passing on CircleCI?
  • Are code maintainability principles being followed?

Deliverable 4: Accessible

  • Does this PR complete the epic?
  • Are links included to any other gov-approved PRs associated with epic?
  • Does PR include documentation for Raft's a11y review?
  • Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

  • Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

  • Does this PR provide background for why coding decisions were made?
  • If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces dependencies, are their licenses documented?
  • Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

  • Does the OWASP Scan pass on CircleCI?
  • Do manual code review and manual testing detect any new security issues?
  • If new issues detected, is investigation and/or remediation plan documented?

Deliverable 8: User Research

Research product(s) clearly articulate(s):

  • the purpose of the research
  • methods used to conduct the research
  • who participated in the research
  • what was tested and how
  • impact of research on TDP
  • (if applicable) final design mockups produced for TDP development

@elipe17 elipe17 self-assigned this Oct 30, 2024
Copy link

codecov bot commented Oct 30, 2024

Codecov Report

Attention: Patch coverage is 72.72727% with 3 lines in your changes missing coverage. Please review.

Project coverage is 91.48%. Comparing base (fa5f15c) to head (75a43cd).
Report is 30 commits behind head on develop.

Files with missing lines Patch % Lines
...ackend/tdpservice/users/api/authorization_check.py 40.00% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3252      +/-   ##
===========================================
- Coverage    91.51%   91.48%   -0.04%     
===========================================
  Files          297      297              
  Lines         8416     8433      +17     
  Branches       608      611       +3     
===========================================
+ Hits          7702     7715      +13     
- Misses         604      605       +1     
- Partials       110      113       +3     
Flag Coverage Δ
dev-backend 91.34% <50.00%> (-0.02%) ⬇️
dev-frontend 92.51% <100.00%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
tdrs-backend/tdpservice/urls.py 92.59% <100.00%> (ø)
tdrs-frontend/src/components/Header/Header.jsx 95.65% <100.00%> (ø)
tdrs-frontend/src/components/SiteMap/SiteMap.jsx 91.66% <100.00%> (ø)
tdrs-frontend/src/selectors/auth.js 97.36% <100.00%> (ø)
...ackend/tdpservice/users/api/authorization_check.py 74.13% <40.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5500be8...75a43cd. Read the comment docs.

---- 🚨 Try these New Features:

Base automatically changed from 3046-plg-cloud to develop November 1, 2024 15:18
Copy link

@raftmsohani raftmsohani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and received email! Exciting 👍

@elipe17 elipe17 added QASP Review and removed raft review This issue is ready for raft review labels Nov 1, 2024
@elipe17 elipe17 requested a review from ADPennington November 1, 2024 17:24
receivers:
- name: 'admin-team-emails'
email_configs:
- to: '[email protected]'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please hide email recipients on lines 67 and 71. additionally, @ttran-hub and I should be recipients when this is ported over to cloud.gov prod env.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will get this templated out. I will also discuss the most convenient way to access this info so that it can be imputed upon deployment with the dev team in OH today.

Copy link
Collaborator

@ADPennington ADPennington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

email received 🚀 please see note below re: recipients. this can merge once we document how we will handle this

local_alert

@elipe17
Copy link
Author

elipe17 commented Nov 18, 2024

email received 🚀 please see note below re: recipients. this can merge once we document how we will handle this

local_alert

@ADPennington just wanted to let you know I put the email template resolution in this PR since it deploys alertmanager.

@elipe17 elipe17 merged commit dd5ea65 into develop Nov 19, 2024
21 checks passed
@elipe17 elipe17 deleted the 3242-local-alert-manager-new branch November 19, 2024 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Local Alertmanager
4 participants