forked from LD4P/qa_server_aws_deploy
-
Notifications
You must be signed in to change notification settings - Fork 1
Monitoring Connections to Authorities
E. Lynette Rayle edited this page May 13, 2022
·
2 revisions
The monitor status page is supposed to update the up/down history once a day around 3am. The process is controlled by...
- There is a cache that expires once a day around 3am. The caching code is in LD4P/qa_server.
- Pingdom calls the monitor status page around 10min after each hour.
- When Pingdom calls at 3:10am, the cache is expired and this causes the set of connection tests to run.
- A timeout occurs because the tests take too long, which Pingdom registers as the system being down.
- When Pingdom hits the system again at 4:10am, the tests are done, the monitor status page loads, and Pingdom registers this as the system being back up assuming none of the authorities failed to connect.
- Pingdom reports this by sending me (and probably @gdelisle) an email and posting to the qa_server_status slack channel.
- You will see 2 reports. One for it going down at 3:10am and a second for it coming back up at 4:10am if no errors occurred in the connection tests.
NOTE: The monitor status page will return an error if any of the authorities fails its connection tests. So Pingdom will continue to see the page as down even once the cache is updated and the page loads fine. This is intentional to allow the page to report when there is an outage.