Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Monitoring to test local performance improvements #211

Merged
merged 19 commits into from
Nov 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
9425ef1
WIP: GO gRPC
Jeffrey-Vervoort-KNMI Aug 9, 2024
6d2f5d3
Setup database logging.
Jeffrey-Vervoort-KNMI Sep 10, 2024
d2b65d0
Enable collector on postgres exporter to gather pg_stat_statements fr…
Jeffrey-Vervoort-KNMI Sep 10, 2024
654f748
Add migration for to setup the pg_stat_statements extension.
Jeffrey-Vervoort-KNMI Sep 11, 2024
ab1d0b3
Setup Grafana dashboard for the database performance based on Prometh…
Jeffrey-Vervoort-KNMI Sep 16, 2024
0903bd3
Expose logging via local volume, document watch logs command and set …
Jeffrey-Vervoort-KNMI Sep 18, 2024
b2c462e
Add gRPC Go metrics.
Jeffrey-Vervoort-KNMI Sep 19, 2024
3eeb83b
Add inspect to query top 10.
Jeffrey-Vervoort-KNMI Sep 20, 2024
d1b50a5
Add gRPC store dashboard.
Jeffrey-Vervoort-KNMI Sep 20, 2024
a6fae50
Improve colours for grpc server dashboard.
Jeffrey-Vervoort-KNMI Oct 18, 2024
81333f7
Add GoLang metrics for the Store.
Jeffrey-Vervoort-KNMI Oct 18, 2024
a19f5ae
Grafana and Prometheus do not need to run in CI CD. Created a seperat…
Jeffrey-Vervoort-KNMI Oct 18, 2024
6dec428
Remove logging as it is not necessary and makes the compose more comp…
Jeffrey-Vervoort-KNMI Oct 23, 2024
f5d8ff6
Update readme with just local command.
Jeffrey-Vervoort-KNMI Oct 23, 2024
5b2af46
Removed units from all titles. Made every rate based on interval drop…
Jeffrey-Vervoort-KNMI Oct 25, 2024
cfca146
Delete old readme.
Jeffrey-Vervoort-KNMI Oct 28, 2024
b33c150
Change variables which are list of values to a single value. As in 99…
Jeffrey-Vervoort-KNMI Oct 29, 2024
e989c33
Rename to in flight grpc requests.
Jeffrey-Vervoort-KNMI Nov 5, 2024
6d45f48
When adding pg_stat_statements to the settings of the database with t…
Jeffrey-Vervoort-KNMI Nov 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,18 @@ ci/scripts/install-just.sh

The Justfile is a simple text file that contains a list of tasks. Each task is a shell command. For example:

To run build and run the services locally:
To run build and run the services locally without the monitoring:

```bash
just up test
```

To run build and run the services locally with the monitoring:

```bash
just local test
```

To run everything including client and do a cleanup of the database afterward:

```bash
Expand Down
6 changes: 6 additions & 0 deletions datastore/database/extra.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# postgres-exporter
shared_preload_libraries = 'pg_cron,pg_stat_statements'
track_activity_query_size = 2048
pg_stat_statements.track = 'all'
pg_stat_statements.max = 10000
pg_stat_statements.save = 'on'
1 change: 1 addition & 0 deletions datastore/datastore/.dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
!common
!dsimpl
!storagebackend
!metrics
27 changes: 20 additions & 7 deletions datastore/datastore/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,27 @@ go 1.21
require google.golang.org/grpc v1.64.0

require (
github.com/cridenour/go-postgis v1.0.0
google.golang.org/protobuf v1.34.1
github.com/cridenour/go-postgis v1.0.0
github.com/grpc-ecosystem/go-grpc-middleware/providers/prometheus v1.0.1
github.com/prometheus/client_golang v1.20.4
google.golang.org/protobuf v1.34.2
)

require (
github.com/lib/pq v1.10.9
golang.org/x/net v0.26.0 // indirect
golang.org/x/sys v0.21.0 // indirect
golang.org/x/text v0.16.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240604185151-ef581f913117 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/grpc-ecosystem/go-grpc-middleware/v2 v2.1.0 // indirect
github.com/klauspost/compress v1.17.9 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/prometheus/client_model v0.6.1 // indirect
github.com/prometheus/common v0.55.0 // indirect
github.com/prometheus/procfs v0.15.1 // indirect
)

require (
github.com/lib/pq v1.10.9
golang.org/x/net v0.26.0 // indirect
golang.org/x/sys v0.22.0 // indirect
golang.org/x/text v0.16.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240604185151-ef581f913117 // indirect
)
54 changes: 52 additions & 2 deletions datastore/datastore/main/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,23 @@ import (
"datastore/storagebackend"
"datastore/storagebackend/postgresql"
"fmt"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/collectors"
"github.com/prometheus/client_golang/prometheus/promhttp"
"log"
"net"
"time"

// gRPC
"google.golang.org/grpc"
"google.golang.org/grpc/health"
"google.golang.org/grpc/health/grpc_health_v1"
"google.golang.org/grpc/peer"

// Monitoring
"datastore/metrics"
grpcprometheus "github.com/grpc-ecosystem/go-grpc-middleware/providers/prometheus"

_ "expvar"
"net/http"
_ "net/http/pprof"
Expand Down Expand Up @@ -52,8 +60,34 @@ func main() {
return resp, err
}

// create gRPC server
server := grpc.NewServer(grpc.UnaryInterceptor(reqTimeLogger))
grpcMetrics := grpcprometheus.NewServerMetrics(
grpcprometheus.WithServerHandlingTimeHistogram(
grpcprometheus.WithHistogramBuckets([]float64{0.001, 0.01, 0.1, 0.3, 0.6, 1, 3, 6, 9, 20, 30, 60, 90, 120}),
),
)
reg := prometheus.NewRegistry()
reg.MustRegister(
grpcMetrics,
promservermetrics.InFlightRequests,
promservermetrics.UptimeCounter,
promservermetrics.ResponseSizeSummary,
collectors.NewGoCollector(),
collectors.NewProcessCollector(collectors.ProcessCollectorOpts{}),
)

go promservermetrics.TrackUptime()

// create gRPC server with middleware
server := grpc.NewServer(
grpc.ChainUnaryInterceptor(
reqTimeLogger,
promservermetrics.InFlightRequestInterceptor,
promservermetrics.ResponseSizeUnaryInterceptor,
grpcMetrics.UnaryServerInterceptor(),
),
)

grpcMetrics.InitializeMetrics(server)
grpc_health_v1.RegisterHealthServer(server, health.NewServer())

// create storage backend
Expand Down Expand Up @@ -81,6 +115,22 @@ func main() {
http.ListenAndServe("0.0.0.0:6060", nil)
}()

// serve go metrics for monitoring
go func() {
httpSrv := &http.Server{Addr: "0.0.0.0:8081"}
m := http.NewServeMux()
// Create HTTP handler for Prometheus metrics.
m.Handle("/metrics", promhttp.HandlerFor(
reg,
promhttp.HandlerOpts{
EnableOpenMetrics: true,
},
))
httpSrv.Handler = m
log.Println("Starting HTTP server for Prometheus metrics on :8081")
log.Fatal(httpSrv.ListenAndServe())
}()

// serve incoming requests
log.Printf("starting server\n")
if err := server.Serve(listener); err != nil {
Expand Down
67 changes: 67 additions & 0 deletions datastore/datastore/metrics/promservermetrics.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
package promservermetrics

import (
"context"
"github.com/prometheus/client_golang/prometheus"
"google.golang.org/grpc"
"sync"
"time"
)

var (
UptimeCounter = prometheus.NewCounter(prometheus.CounterOpts{
Name: "grpc_server_uptime_seconds",
Help: "Total uptime of the gRPC server in seconds",
})

InFlightRequests = prometheus.NewGauge(prometheus.GaugeOpts{
Name: "grpc_in_flight_requests",
Help: "Current number of in-flight gRPC requests",
})

ResponseSizeSummary = prometheus.NewSummaryVec(
prometheus.SummaryOpts{
Name: "grpc_response_size_summary_bytes",
Help: "Summary of response sizes in bytes for each gRPC method, with mean, min, and max",
Objectives: map[float64]float64{0.0: 0.001, 1.0: 0.001}, // Track min (0.0 quantile) and max (1.0 quantile)
},
[]string{"method"},
)

responseSizeMu sync.Mutex
responseSizeSum = make(map[string]float64)
responseSizeCount = make(map[string]float64)
)

func TrackUptime() {
// Increment the uptime every second
for {
UptimeCounter.Inc()
time.Sleep(1 * time.Second)
}
}

func InFlightRequestInterceptor(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
InFlightRequests.Inc() // Increment at the start of the request
defer InFlightRequests.Dec() // Decrement at the end of the request
return handler(ctx, req)
}

func ResponseSizeUnaryInterceptor(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (resp interface{}, err error) {
resp, err = handler(ctx, req)

if resp != nil {
responseSize := float64(len(resp.(interface{ String() string }).String()))

ResponseSizeSummary.WithLabelValues(info.FullMethod).Observe(responseSize)

// Used a mutex to synchronise the access for the responseSizeSum and responseSizeCount.
// To prevent race conditions and multiple goroutines accessing the variables at the same time.
responseSizeMu.Lock()
responseSizeSum[info.FullMethod] += responseSize
responseSizeCount[info.FullMethod]++
responseSizeMu.Unlock()
}

return resp, err
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DROP EXTENSION pg_stat_statements;
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DROP EXTENSION IF EXISTS pg_stat_statements;
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
52 changes: 48 additions & 4 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
version: "3.8"
name: datastore

services:
Expand All @@ -10,8 +9,10 @@ services:
volumes:
# - ts-data:/home/postgres/pgdata/data # for timescale image
- ts-data:/var/lib/postgresql # for postgres image
- ./datastore/database/healthcheck_postgis_uptime.sh:/healthcheck_postgis_uptime.sh # for the healthcheck
- ./datastore/database/extra.conf:/etc/conf_settings/extra.conf:ro # Extra Postgres configuration
- ./datastore/database/healthcheck_postgis_uptime.sh:/healthcheck_postgis_uptime.sh:ro # for the healthcheck
environment:
- EXTRA_CONF_DIR=/etc/conf_settings
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=mysecretpassword
- POSTGRES_DB=data
Expand All @@ -26,7 +27,7 @@ services:
]
interval: 5s
timeout: 1s
retries: 3
retries: 30
start_period: 30s # Failures in 30 seconds do not mark container as unhealthy

migrate:
Expand All @@ -46,6 +47,7 @@ services:
ports:
- "50050:50050"
- "6060:6060" # for flame graphing
- "8081:8081"
environment:
- PGHOST=db
- PGPORT=5432
Expand Down Expand Up @@ -156,10 +158,52 @@ services:
- DSHOST=store
- DSPORT=50050
volumes:
- ./datastore/load-test/output:/load-test/output
- ./datastore/load-test/output:/load-test/output:rw
depends_on:
store:
condition: service_healthy

prometheus:
profiles: ["monitoring"]
image: prom/prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus

prometheus-postgres-exporter:
profiles: ["monitoring"]
image: quay.io/prometheuscommunity/postgres-exporter
environment:
- DATA_SOURCE_URI=db:5432/data
- DATA_SOURCE_USER=postgres
- DATA_SOURCE_PASS=mysecretpassword
ports:
- "9187:9187"
volumes:
- ./prometheus/postgres_exporter.yml:/postgres_exporter.yml:ro
depends_on:
db:
condition: service_healthy
command: ["--collector.stat_statements", "--collector.stat_user_tables", "--collector.stat_activity_autovacuum"]

grafana:
profiles: ["monitoring"]
image: grafana/grafana-oss:11.2.0
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=mysecretpassword
volumes:
- grafana-storage:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:rw
- ./grafana/dashboards:/var/lib/grafana/dashboards:rw
depends_on:
- prometheus

volumes:
ts-data:
prometheus-data:
grafana-storage:
Loading