Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server metrics and monitoring #67

Open
steinbro opened this issue Nov 3, 2023 · 8 comments
Open

Server metrics and monitoring #67

steinbro opened this issue Nov 3, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request infrastructure Related to backend services and other remote servers

Comments

@steinbro
Copy link
Member

steinbro commented Nov 3, 2023

In addition to serving tiles, the backend also exposes two additional API endpoints:

  • /metrics that reports some performance and usage stats in Prometheus format
  • /probe/alive that can be pinged to monitor uptime

We're not utilizing either of these, but we should be. @RDMurray, do you have any favorite dashboard or monitoring tools? Not talking about full-blown error monitoring like Sentry, just something that can render the metrics histograms and something else that can fire off an email when the heartbeat isn't responding.

Actually, now that I look at the metrics, it does report that the alive endpoint has been pinged a few thousand times since it was spun up a few weeks ago -- is this being polled by something?

@steinbro steinbro added enhancement New feature or request infrastructure Related to backend services and other remote servers labels Nov 3, 2023
@RDMurray
Copy link
Contributor

RDMurray commented Nov 3, 2023

I don't have any favourite monitoring tools, but I'll look into it. Sentry does have a generous free plan for open source organizations, so that might be worth looking into. Having said that I know very little about Sentry apart from that it is popular. I have no sight at all, so I can't really give an opinion on software that renders histograms or helps to visualise data.

I'm currently monitoring the alive endpoint with Uptime Robot, which sends an email if it is down. It also has a mobile app with push notifications.

@steinbro
Copy link
Member Author

steinbro commented Nov 4, 2023

Thanks. For Uptime Robot, is there a way to add more users to the account, or is it easiest to just create my own monitor if I want notifications? It also looks like you can create a basic uptime page for free as well; should we do so, at least for our own maintenance purposes?

I did try tinkering with Grafana Cloud, which also provides some customizable alerts, but the documentation states "Grafana Cloud won’t accept a public URL that is not protected by authentication," ostensibly for some security reason. I'd rather keep our metrics page open, though I suppose we could have a second password-protected URL if we really wanted to make Grafana Cloud happy. But I suppose visual dashboards wouldn't be super useful for this crowd, anyway.

@RDMurray
Copy link
Contributor

RDMurray commented Nov 5, 2023

The free tier of Uptime Robot doesn't allow adding users. Hopefully it will allow you to create a monitor for the same site. Even if they don't allow that, it might work anyway because I am still monitoring newprod0.openscape.io.

There is an open source uptime monitoring service Upptime which uses Github actions and Issues. It can poll every 5 minutes. I find the idea of spinning up a VM every 5 minutes just to do an http request kind of horrible, but presumably Github is okay with it so we could possibly use that.

@steinbro
Copy link
Member Author

@RDMurray What are your thoughts on Glitchtip? It uses the Sentry API but has a much simpler UI. It also has a hosted free tier.

@RDMurray
Copy link
Contributor

I have played with Glitchtip a bit, making a test project and sending some events and metrics. It certainly is a much simpler UI. Issues are logged in detail.

The performance monitoring seems to be very simplistic though. I can only see the number of events and average duration, With a screen reader that is, I don't know if there is a graph.

The free tier is only 1000 events per month and I can't see any additional offers for open source projects.

I think the sentry open source free tier is to good to pass up, provided there are no shostoppers with the UI.

@RDMurray
Copy link
Contributor

I also created a test project on sentry.io. It is much more comprehensive, and looks very accessible so far. Once we have a dashboard or two and some alerts set up, it should be quite usable.

@RDMurray
Copy link
Contributor

Related to this issue, I set up an uptime monitoring service Uptime Kuma at uptime.mur.org.uk which can currently be accessed by the team @soundscape-community/backend . There is a public status page at soundscape-status.mur.org.uk.

It is self-hosted, but simple enough to manage and not a critical service.

Let me know what you think.

@steinbro
Copy link
Member Author

steinbro commented Dec 1, 2023

Nice! Together with the Slack integration, I'm satisfied with this level of monitoring for the tile service. I'd say the next priority is monitoring the ingest service (#71), and since you suggested using Sentry for that which you mentioned here I'll leave this issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request infrastructure Related to backend services and other remote servers
Projects
None yet
Development

No branches or pull requests

2 participants