Skip to content

Commit

Permalink
update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Richard Patel committed Apr 30, 2022
1 parent 6f0c94c commit 747acfd
Show file tree
Hide file tree
Showing 4 changed files with 55 additions and 3 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
.idea/
.DS_Store
51 changes: 51 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Solana Cluster Manager

Tooling to manage private clusters of Solana nodes.

## Architecture

### Snapshot management

**[Twitter 🧵](https://twitter.com/terorie_dev)**

Snapshot management tooling enables efficient peer-to-peer transfers of accounts database archives.

![Snapshot Fetch](./docs/snapshots.png)

**Scraping** (Flow A)

Snapshot metadata collection runs periodically similarly to Prometheus scraping.

Each cluster-aware node runs a lightweight `solana-cluster sidecar` agent providing telemetry about its snapshots.

The `solana-cluster tracker` then connects to all sidecars to assemble a complete list of snapshot metadata.
The tracker is stateless so it can be replicated.
Service discovery is available through HTTP and JSON files. Consul SD support is planned.

Side note: Snapshot sources are configurable in stock Solana software but only via static lists.
This does not scale well with large fleets because each cluster change requires updating the lists of all nodes.

**Downloading** (Flow B)

When a Solana node needs to fetch a snapshot remotely, the tracker helps it find the best snapshot source.
Nodes will download snapshots directly from the sidecars of other nodes.

### TPU & TVU

Not yet ready for release. 🚜

## Motivation

When Solana validators first start, they have to retrieve and validate hundreds of gigabytes of state data from a remote node.
During normal operation, validators stream at least 500 Mbps of traffic in either direction.

For Solana infra operators that manage more than node (not to mention hundreds), this cost currently scales linearly as well.
It shouldn't have to though.

Co-located Solana validators that are controlled by the same entity should also behave as one.
Bandwidth cost is especially asymmetric:
10 Gbps connectivity is also cheap and abundant locally within data centers
but persistent 1 Gbps streams between globally dispersed validator gets expensive quickly.

Blockdaemon manages one of the largest validator and RPC infrastructure deployments to date, backed by a custom peer-to-peer backbone.
This repository shares our tools for performance and sustainability (network & SSD wear) improvements.
3 changes: 0 additions & 3 deletions docs/cluster.png

This file was deleted.

3 changes: 3 additions & 0 deletions docs/snapshots.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 747acfd

Please sign in to comment.