Skip to content

Commit

Permalink
hive: metrics support with prometheus and grafana
Browse files Browse the repository at this point in the history
  • Loading branch information
protolambda committed Dec 12, 2022
1 parent f0f6472 commit 7c3acf5
Show file tree
Hide file tree
Showing 23 changed files with 4,931 additions and 18 deletions.
4 changes: 4 additions & 0 deletions clients/lighthouse-bn/hive.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,7 @@ roles:
build_targets:
- mainnet
- minimal
metrics:
port: 5054
labels:
vendor: lighthouse
6 changes: 3 additions & 3 deletions clients/lighthouse-bn/lighthouse_bn.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ echo "bootnodes: ${HIVE_ETH2_BOOTNODE_ENRS}"

CONTAINER_IP=`hostname -i | awk '{print $1;}'`
eth1_option=$([[ "$HIVE_ETH2_ETH1_RPC_ADDRS" == "" ]] && echo "--dummy-eth1" || echo "--eth1-endpoints=$HIVE_ETH2_ETH1_RPC_ADDRS")
metrics_option=$([[ "$HIVE_ETH2_METRICS_PORT" == "" ]] && echo "" || echo "--metrics --metrics-address=0.0.0.0 --metrics-port=$HIVE_ETH2_METRICS_PORT --metrics-allow-origin=*")
if [ "$HIVE_ETH2_MERGE_ENABLED" != "" ]; then
echo -n "0x7365637265747365637265747365637265747365637265747365637265747365" > /jwtsecret
merge_option="--execution-endpoints=$HIVE_ETH2_ETH1_ENGINE_RPC_ADDRS --jwt-secrets=/jwtsecret"
Expand All @@ -77,7 +76,7 @@ lighthouse \
--testnet-dir=/data/testnet_setup \
bn \
--network-dir=/data/network \
$metrics_option $eth1_option $merge_option $opt_sync_option \
$eth1_option $merge_option $opt_sync_option \
--enr-tcp-port="${HIVE_ETH2_P2P_TCP_PORT:-9000}" \
--enr-udp-port="${HIVE_ETH2_P2P_UDP_PORT:-9000}" \
--enr-address="${CONTAINER_IP}" \
Expand All @@ -89,4 +88,5 @@ lighthouse \
--subscribe-all-subnets \
--boot-nodes="${HIVE_ETH2_BOOTNODE_ENRS:-""}" \
--max-skip-slots="${HIVE_ETH2_MAX_SKIP_SLOTS:-1000}" \
--http --http-address=0.0.0.0 --http-port="${HIVE_ETH2_BN_API_PORT:-4000}" --http-allow-origin="*"
--http --http-address=0.0.0.0 --http-port="${HIVE_ETH2_BN_API_PORT:-4000}" --http-allow-origin="*" \
--metrics --metrics-address=0.0.0.0 --metrics-port=5054
42 changes: 37 additions & 5 deletions docs/clients.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,50 @@ Dockerfile.
### hive.yaml

Hive reads additional metadata from the `hive.yaml` file in the client directory (next to
the Dockerfile). Currently, the only purpose of this file is specifying the client's role
list:
the Dockerfile).

roles:
- "eth1"
- "eth1_light_client"
#### Role definitions

In this YAML, the client's role list can be defined:

```yaml
roles:
- "eth1"
- "eth1_light_client"
```
The role list is available to simulators and can be used to differentiate between clients
based on features. Declaring a client role also signals that the client supports certain
role-specific environment variables and files. If `hive.yml` is missing or doesn't declare
roles, the `eth1` role is assumed.

#### Metrics definition

Additionally, a metrics scrape target can be defined:
```yaml
metrics:
port: 6060 # the port of the /metrics to scrape with prometheus
labels:
my_special_label: "foobar"
```

This is optional, no metrics will be collected if this is not specified.

Hive automatically inserts the following labels by default:
- `suite`: test suite the client was started in.
- `test`: test case the client was started in.
- `client`: name of the client definition.
- `version`: version of the client.
- `roles`: comma-separated list of roles assigned to the client.
- For each of the roles: `role_` + the role name, set to `true`

These labels can be used by dashboards to specialize for certain client types,
display client differences, or enable filtering per test-suite and/or test-case.

Client instances started by hive can also get a `name` label by specifying the
`HIVE_METRICS_NAME` environment variable when starting the client as simulator.


### /version.txt

Client Dockerfiles are expected to generate a `/version.txt` file during build. Hive reads
Expand Down
18 changes: 18 additions & 0 deletions docs/commandline.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,24 @@ directory (note the first `/`, matching any suite name):

./hive --sim ethereum/consensus --sim.limit /stBugs/

## Metrics

Hive integrates prometheus and grafana to provide metrics of clients during testing.
Metrics are disabled by default, but can be enabled with the `--metrics` flag.

When enabled, Hive starts a prometheus container, and automatically scrapes metrics of
the clients that have configured metrics endpoints.

Clients, test cases and test suites will show up in grafana as annotations, and all of
the client metrics are labeled with client and test meta-data as well.
See [Clients] docs for more details.

The port used for grafana can be configured with an optional flag: `--metrics.grafana=3000`.
Grafana can be disabled by setting the port to 0, to only run prometheus.

Prometheus is not exposed to the host by default,
but can be configured with an optional flag: `--metrics.prometheus=9090`.

## Viewing simulation results (hiveview)

The results of hive simulation runs are stored in JSON files containing test results, and
Expand Down
11 changes: 10 additions & 1 deletion hive.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,10 @@ import (
"strings"
"time"

"gopkg.in/inconshreveable/log15.v2"

"github.com/ethereum/hive/internal/libdocker"
"github.com/ethereum/hive/internal/libhive"
"gopkg.in/inconshreveable/log15.v2"
)

func main() {
Expand All @@ -32,6 +33,9 @@ func main() {
simLogLevel = flag.Int("sim.loglevel", 3, "Selects log `level` of client instances. Supports values 0-5.")
simDevMode = flag.Bool("dev", false, "Only starts the simulator API endpoint (listening at 127.0.0.1:3000 by default) without starting any simulators.")
simDevModeAPIEndpoint = flag.String("dev.addr", "127.0.0.1:3000", "Endpoint that the simulator API listens on")
metrics = flag.Bool("metrics", false, "Flag to enable metrics collection with prometheus")
metricsGrafanaPort = flag.Uint("metrics.grafana", 8080, "Host port to bind grafana frontend to, grafana will not run if this is 0.")
metricsPrometheusPort = flag.Uint("metrics.prometheus", 0, "Host port to bind prometheus to, prometheus will run but not be exposed to the host if this is 0 (host port is not required for plugging into grafana).")

clients = flag.String("client", "go-ethereum", "Comma separated `list` of clients to use. Client names in the list may be given as\n"+
"just the client name, or a client_branch specifier. If a branch name is supplied,\n"+
Expand Down Expand Up @@ -107,6 +111,11 @@ func main() {
SimParallelism: *simParallelism,
SimDurationLimit: *simTimeLimit,
ClientStartTimeout: *clientTimeout,
Metrics: libhive.MetricsEnvOptions{
Enabled: *metrics,
GrafanaPort: *metricsGrafanaPort,
PrometheusPort: *metricsPrometheusPort,
},
}
runner := libhive.NewRunner(inv, builder, cb)
clientList := splitAndTrim(*clients, ",")
Expand Down
12 changes: 12 additions & 0 deletions internal/fakes/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"net/http"
"sync"
"sync/atomic"
"time"

"github.com/ethereum/hive/internal/libhive"
)
Expand Down Expand Up @@ -65,6 +66,17 @@ func (b *fakeBackend) Build(context.Context, libhive.Builder) error {
return nil
}

func (b *fakeBackend) InitMetrics(ctx context.Context, grafanaPort uint, prometheusPort uint) error {
return nil
}

func (b *fakeBackend) CloseMetrics() {
}

func (b *fakeBackend) AnnotateMetrics(ctx context.Context, startTime, endTime time.Time, text string) error {
return nil
}

func (b *fakeBackend) ServeAPI(ctx context.Context, h http.Handler) (libhive.APIServer, error) {
l, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
Expand Down
48 changes: 45 additions & 3 deletions internal/libdocker/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,11 @@ import (
"sync"
"time"

"github.com/ethereum/hive/hiveproxy"
"github.com/ethereum/hive/internal/libhive"
docker "github.com/fsouza/go-dockerclient"
"gopkg.in/inconshreveable/log15.v2"

"github.com/ethereum/hive/hiveproxy"
"github.com/ethereum/hive/internal/libhive"
)

type ContainerBackend struct {
Expand All @@ -26,6 +27,8 @@ type ContainerBackend struct {
logger log15.Logger

proxy *hiveproxy.Proxy

metrics *Metrics
}

func NewContainerBackend(c *docker.Client, cfg *Config) *ContainerBackend {
Expand Down Expand Up @@ -84,6 +87,19 @@ func (b *ContainerBackend) CreateContainer(ctx context.Context, imageName string
Image: imageName,
Env: vars,
},
HostConfig: &docker.HostConfig{},
}
if len(opt.HostPorts) > 0 {
createOpts.Config.ExposedPorts = make(map[docker.Port]struct{})
createOpts.HostConfig.PortBindings = make(map[docker.Port][]docker.PortBinding)
}
for k, vs := range opt.HostPorts {
createOpts.Config.ExposedPorts[docker.Port(k)] = struct{}{}
var bindings []docker.PortBinding
for _, v := range vs {
bindings = append(bindings, docker.PortBinding{HostIP: "127.0.0.1", HostPort: v})
}
createOpts.HostConfig.PortBindings[docker.Port(k)] = bindings
}

if opt.Input != nil {
Expand Down Expand Up @@ -189,17 +205,43 @@ func (b *ContainerBackend) StartContainer(ctx context.Context, containerID strin
info.Wait()
info.Wait = nil
}

if checkErr == nil && opt.Metrics != nil {
if err := b.AnnotateMetrics(ctx, startTime, time.Now(), fmt.Sprintf("creating container %s (IP %s)", containerID, info.IP)); err != nil {
logger.Error("failed to annotate client creation", "err", err)
}
// If a metrics endpoint is specified, register the container to be scraped
if err := b.metrics.ScrapeMetrics(ctx, info, opt.Metrics); err != nil {
b.DeleteContainer(containerID)
info.Wait()
info.Wait = nil
return info, fmt.Errorf("failed to start scraping metrics of container")
}
}

return info, checkErr
}

// DeleteContainer removes the given container. If the container is running, it is stopped.
func (b *ContainerBackend) DeleteContainer(containerID string) error {
b.logger.Debug("removing container", "container", containerID[:8])
startTime := time.Now()
err := b.client.RemoveContainer(docker.RemoveContainerOptions{ID: containerID, Force: true})
if err != nil {
b.logger.Error("can't remove container", "container", containerID[:8], "err", err)
return err
}
return err
if b.metrics != nil {
ctx, cancel := context.WithTimeout(context.Background(), time.Second*10)
defer cancel()
if err := b.AnnotateMetrics(ctx, startTime, time.Now(), fmt.Sprintf("closing container %s", containerID)); err != nil {
b.logger.Error("failed to annotate client deletion", "container", containerID, "err", err)
}
if err := b.metrics.StopScrapingMetrics(ctx, containerID); err != nil {
b.logger.Error("failed to remove scrape target", "container", containerID, "err", err)
}
}
return nil
}

// CreateNetwork creates a docker network.
Expand Down
14 changes: 14 additions & 0 deletions internal/libdocker/graf/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM grafana/grafana

ADD annotate_metrics.sh /hive/annotate_metrics.sh

# see https://grafana.com/tutorials/provision-dashboards-and-data-sources/
# for more information about grafana provisioning configuration

ADD grafana.ini /etc/grafana/config.ini

ADD dashboards_provider.yaml /etc/grafana/provisioning/dashboards/dashboards_provider.yaml
ADD dashboards /hive/grafana-dashboards



8 changes: 8 additions & 0 deletions internal/libdocker/graf/annotate_metrics.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

set -e

# The annotation is passed as JSON encoded payload in the first shell argument,
# and then passed on as body to the API POST request.
# See https://grafana.com/docs/grafana/latest/developers/http_api/annotations/
wget --post-data="$1" --header "Content-Type: application/json" "http://127.0.0.1:3000/api/annotations"
Loading

0 comments on commit 7c3acf5

Please sign in to comment.