Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: opentelemetry metrics #1966

Merged
merged 9 commits into from
Dec 9, 2023
Merged

feat: opentelemetry metrics #1966

merged 9 commits into from
Dec 9, 2023

Conversation

luan
Copy link
Contributor

@luan luan commented Dec 3, 2023

Canary Metrics (OpenTelemetry)

By default, no metrics are collected or exported. To enable metrics, you must setup a metrics exporter. The following example shows how to setup a Prometheus exporter.

config.lua

metricsEnablePrometheus = true
metricsPrometheusAddress = "0.0.0.0:9464"

This, in and of itself will expose a Prometheus endpoint at http://localhost:9464/metrics. However, you will need to configure Prometheus to scrape this endpoint.

The easiest, batteries included way, to do this is using the provided docker-compose.yml file provided in this metrics directory. Simply run docker-compose up and you will have a Prometheus instance running and scraping the Canary metrics endpoint.

The docker-compose.yml file also includes a Grafana instance that is preconfigured to use the Prometheus instance as a data source. The Grafana instance is exposed at http://localhost:3000 and the default username and password are admin and admin respectively (you will be prompted to change the password on first login).

Usage

This is an advanced feature. While you can simply enable OStream and get metrics in your logs, that is not recommended to do in production. Prometheus can be run efficiently in production with minimal impact to server performance.

Enabling OStream:

metricsEnableOstream = true
metricsOstreamInterval = 1000

If you don't how what Prometheus and Grafana are, you need to learn that first: https://prometheus.io/ is your starting point. You can come back to this feature once you've understood how to install and run this software.

Metrics

We export all kinds of metrics, but the most important ones are:

Here's an interactive demo of a dashboard from a real production server: https://snapshots.raintank.io/dashboard/snapshot/bpiq45inK3I2Xixa2d7oNHWekdiDE6zr

  • Latency metrics for C++ methods
  • Latency metrics for Lua functions
  • Latency metrics for SQL queries
  • Latency metrics for Dispatcher tasks
  • Latency metrics for DB Lock contention

Screenshot
grafana

Analytics

We also export analytic event, counters and other useful data. This is useful for debugging and understanding the behavior of the server. Some interesting ones are:

  • Stats around monsters killed (per monster type, player, etc)
  • Stats around raw exp and total exp gained
  • Stats around wealth gained (based on gold and item drops, with their NPC value)

Examples:

Note: you can normally see player names here, I've hidden those for privacy.

Raw exp/h
exp-per-hour

Raw gold/h
gold-per-hour

Monsters killed/h
monsters-per-hour

@luan luan changed the title feat: comprehensive opentelemtry metrics feat: comprehensive opentelemetry metrics Dec 3, 2023
@luan luan changed the title feat: comprehensive opentelemetry metrics feat: opentelemetry metrics Dec 3, 2023
@luan luan force-pushed the luan/metrics branch 2 times, most recently from a50de87 to 0a33920 Compare December 3, 2023 04:22
@luan luan marked this pull request as ready for review December 3, 2023 04:32
@luan luan force-pushed the luan/metrics branch 2 times, most recently from f13bfdb to 48c26b9 Compare December 5, 2023 01:47
Copy link

sonarqubecloud bot commented Dec 9, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 48 Code Smells

0.0% 0.0% Coverage
0.5% 0.5% Duplication

@luan luan merged commit 9d60993 into main Dec 9, 2023
37 checks passed
@luan luan deleted the luan/metrics branch December 9, 2023 19:06
@Zapotoczny
Copy link

Zapotoczny commented Jun 3, 2024

crash_civetweb-worker.log
It looks like a crash caused by these changes
@luan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants