diff --git a/cloud-architecture.md b/cloud-architecture.md index 64b6524..c2bdf9b 100644 --- a/cloud-architecture.md +++ b/cloud-architecture.md @@ -1,6 +1,6 @@ # Cloud Architecture -The best known baking tool, Kiln, is a complete solution for home baking, to be installed on physical hardware. MIDL.dev, in contrast, outsources most of the computing to the cloud, increasing reliability and decreasing the operating burden of the baking operations. +The best known baking tool, [Kiln](https://gitlab.com/tezos-kiln/kiln/), is a complete solution for home baking, to be installed on physical hardware. MIDL.dev, in contrast, outsources most of the computing to the cloud, increasing reliability and decreasing the operating burden of the baking operations. ## Managed Kubernetes @@ -18,7 +18,7 @@ Tezos mainnet is fully supported as well as Carthagenet and future test networks Network security policies are applied in every pod to ensure all traffic is legitimate. -## Fast bringup time +## Fast bootstrapping The nodes are brought up from a snapshot for faster turnaround time and disaster recovery. diff --git a/deploy-remote-signer.md b/deploy-remote-signer.md index 5c06474..d24dce1 100644 --- a/deploy-remote-signer.md +++ b/deploy-remote-signer.md @@ -1,6 +1,6 @@ # Deploy a baker with remote signer -The [quickstart]() indicates how to set up a cloud baker with the private keys hosted in a Kubernetes secrets. In this section, we configure a remote signer on-premises. This configuration is more secure and every mainnet deployment should use it. +The [quickstart guide](quickstart) indicates how to set up a cloud baker with the private keys hosted in [Kubernetes Secrets](https://kubernetes.io/docs/concepts/configuration/secret/). In this section, we configure a remote signer on-premises. This configuration is more secure and every mainnet deployment should use it. ## Prerequisites @@ -12,7 +12,7 @@ First, deploy the Tezos-on-GKE baking setup in the cluster. You may deploy it from the tezos-on-gke repo itself using the [quickstart](https://github.com/midl-dev/tezos-on-gke) instructions, however it is better to follow [production best practices](production-readiness). -In these instructions, we will be declaring a `tezos-baker` terraform module, keeping all sensitive data in a `terraform.tfvars` file. +In these instructions, we will be declaring a `tezos-baker` Terraform module, keeping all sensitive data in a `terraform.tfvars` file. First, on an empty project dir, create a new `main.tf` file: @@ -37,7 +37,7 @@ Note that `authorized_signers` is empty for now. ### ssh endpoint host key -A ssh server normally generates its host keys during installation. Here, we are generating them externally and injecting them into the setup as a terraform and kubernetes secrets. +A ssh server normally generates its host keys during installation. Here, we are generating them externally and injecting them into the setup as a Terraform and Kubernetes Secrets. This way, in case of complete destruction of the cluster, the operator is able to restore the ssh endpoint with the same host key. @@ -51,7 +51,7 @@ To generate a RSA host key, issue the following command on any computer: ssh-keygen -q -N "" -t rsa -b 4096 -f tezos_tunnel_endpoint_host_rsa_key ``` -This value is sensitive, so we will configure it as a terraform variable. In the terraform module, we are passing it as a variable: +This value is sensitive, so we will configure it as a Terraform variable. In the Terraform module, we are passing it as a variable: ``` signer_target_host_key=var.signer_target_host_key @@ -103,7 +103,7 @@ cat /home/tezos/.ssh/id_rsa.pub ### Configure the endpoint -You may now declare this remote signer in the terraform parameter `baking_nodes`. +You may now declare this remote signer in the Terraform parameter `baking_nodes`. The `authorized_signers` list takes signer maps consisting of the following key/value pairs: @@ -131,11 +131,11 @@ module "tezos-baker" { } ``` -Then `taint` and `apply` the terraform module. +Then `taint` and `apply` the Terraform module. ### Test the endpoint -Once terraform has deployed, you should be able to ssh from the signer to the endpoint. +Once Terraform has deployed, you should be able to ssh from the signer to the endpoint. Test it by sshing to the signer as `tezos` user, then: diff --git a/production-readiness.md b/production-readiness.md index 4dae692..6fb47b6 100644 --- a/production-readiness.md +++ b/production-readiness.md @@ -8,30 +8,30 @@ Usage of [Google Default Application Credentials](https://cloud.google.com/docs/ Instead: -* ensure that you have set up an Organization - that can be done by registering a domain name and adding it to gcloud -* create a Terraform Admin Project, Terraform Service Account and Service Account Credentials following [this Google guide](https://cloud.google.com/community/tutorials/managing-gcp-projects-with-terraform) -* do not pass `project` as a variable when deploying the resources. Instead, pass `organization_id` and `billing_account` as variables -* pass the service account credentials json file `serviceAccount:terraform@${TF_ADMIN}.iam.gserviceaccount.com` as `terraform_service_account_credentials` terraform variable +* ensure that you have set up an Organization - that can be done by registering a domain name and adding it to Google Cloud; +* create a Terraform Admin Project, Terraform Service Account and Service Account Credentials following [this Google guide](https://cloud.google.com/community/tutorials/managing-gcp-projects-with-terraform); +* do not pass `project` as a variable when deploying the resources. Instead, pass `organization_id` and `billing_account` as variables; +* pass the service account credentials JSON file `serviceAccount:terraform@${TF_ADMIN}.iam.gserviceaccount.com` as `terraform_service_account_credentials` Terraform variable. -That will create the cluster in a new project, created by the terraform service account. +That will create the cluster in a new project, created by the Terraform service account. -You may then grant people in your organization access to the project. It is recommended to write more terraform manifests to do so. +You may then grant people in your organization access to the project. It is recommended to write more Terraform manifests to do so. ## Separate cluster definition from baker definition While you can create a baker in one-shot, it is best suited for demos and testnets. A production baker is best advised to be defined declaratively. -You would normally write all the parameters defining your baker in a `terraform.tfvars` file in your laptop. +You would normally write all the parameters defining your baker in a `terraform.tfvars` file on your machine. -Instead, it is recommended to create a maintain a private terraform manifest, declaring a cluster, and every deployment that lives within. This way, the paramters defining your cluster can also be committed to git (except secrets which should be handled separately, more on this below). +Instead, it is recommended to create and maintain a private Terraform manifest, declaring a cluster, and every deployment that lives within. This way, the paramters defining your cluster can also be committed to git (except secrets which should be handled separately, more on this below). -This keeps the setup maintainable by letting you define several deployments within the same cluster. For example, you may deploy the Tezos baker setup, and the Tezos monitoring setup, within the same cluster. +This keeps the setup maintainable by letting you define several deployments within the same cluster. For example, you may to deploy the Tezos baker setup, and the Tezos monitoring setup, within the same cluster. ### Define the cluster -The [terraform-gke-blockchain](https://github.com/midl-dev/terraform-gke-blockchain) repository contains boilerplate terraform code to deploy a kubernetes cluster. +The [terraform-gke-blockchain](https://github.com/midl-dev/terraform-gke-blockchain) repository contains boilerplate Terraform code to deploy a Kubernetes cluster. -Start by declaring one empty cluster and one terraform provider: +Start by declaring one empty cluster and one Terraform provider: ``` module "terraform-gke-blockchain" { @@ -53,21 +53,21 @@ provider "kubernetes" { } ``` -Notice that we created two node pools. These are distinct virtual machines that run your kubernetes cluster. You can map your pods to either. We will be using these to separate the baker setup from the payout/monitoring setup. +Notice that we created two node pools. These are distinct virtual machines that run your Kubernetes cluster. You can map your pods to either. We will be using these to separate the baker setup from the payout/monitoring setup. -### Define the tezos baker +### Define the Tezos baker Within the `tezos-on-gke` repository, the `terraform-no-cluster-create` folder will deploy the baker on a pre-existing cluster. The output parameters of the `terraform-gke-blockchain` module become the input parameters of the Tezos baker module. -All variables will appear in the terraform manifest itself, except secrets. Secrets should be kept as variables, and handled appropriately. +All variables will appear in the Terraform manifest itself, except secrets. Secrets should be kept as variables, and handled appropriately. It looks like: ``` module "tezos-baker" { - source = "github.com/midl-dev/tezos-on-gke?ref=v2.0//terraform-no-cluster-create" + source = "github.com/midl-dev/tezos-on-gke?ref=v3.0//terraform-no-cluster-create" region = module.terraform-gke-blockchain.location node_locations = module.terraform-gke-blockchain.node_locations kubernetes_endpoint = module.terraform-gke-blockchain.kubernetes_endpoint @@ -98,11 +98,11 @@ module "tezos-baker" { It is recommended to use a remote signer for secure operations. -Below is an example of baker with remote signer configured: +Below is an example of a baker with remote signer configured: ``` module "tezos-baker" { - source = "github.com/midl-dev/tezos-on-gke?ref=v2.0//terraform-no-cluster-create" + source = "github.com/midl-dev/tezos-on-gke?ref=v3.0//terraform-no-cluster-create" region = module.terraform-gke-blockchain.location node_locations = module.terraform-gke-blockchain.node_locations kubernetes_endpoint = module.terraform-gke-blockchain.kubernetes_endpoint @@ -135,9 +135,9 @@ module "tezos-baker" { } ``` -### With payout config +### Payout configuration -The `baking_nodes` section also accepts a config for TRD payouts. See the [TRD payouts](trd-payouts) section for details. +The `baking_nodes` section also accepts a config for TRD payouts (see the [TRD payouts](trd-payouts) section for details). ## Terraform remote state @@ -177,7 +177,7 @@ terraform { } module "terraform-gke-blockchain" { - source = "github.com/midl-dev/terraform-gke-blockchain?ref=v1.0" + source = "github.com/midl-dev/terraform-gke-blockchain?ref=v2.0" org_id = "" billing_account = "" project_prefix = "mybakingop" @@ -188,7 +188,7 @@ module "terraform-gke-blockchain" { } module "tezos-baker" { - source = "github.com/midl-dev/tezos-on-gke?ref=v2.0//terraform-no-cluster-create" + source = "github.com/midl-dev/tezos-on-gke?ref=v3.0//terraform-no-cluster-create" region = module.terraform-gke-blockchain.location node_locations = module.terraform-gke-blockchain.node_locations kubernetes_endpoint = module.terraform-gke-blockchain.kubernetes_endpoint @@ -201,7 +201,6 @@ module "tezos-baker" { kubernetes_name_prefix = "xtz" full_snapshot_url = "https://mainnet.xtz-shots.io/full" rolling_snapshot_url = "https://mainnet.xtz-shots.io/rolling" - kubernetes_namespace = "tezos" tezos_version = "v9.2" tezos_network = "mainnet" signer_target_host_key=var.signer_target_host_key @@ -261,8 +260,8 @@ A production validator should be operated with an on-call rotation, meaning seve Specifically: -* secrets should be moved from a file in the operator workspace to a production secret store such as [Hashicorp Vault](vaultproject.io) -* terraform deploys should be done by a CI system -* any manual change in the kubernetes environment should be recorded in an audit log and committed in the code: - * the terraform private file above can be applied with continuous integration - * the intermediate kubernetes code generated with kustomize could be stored in a CI pipeline and deployed in an auditable way as well (see [Gitops](https://www.weave.works/technologies/gitops/)). +* secrets should be moved from a file in the operator workspace to a production secret store such as [Hashicorp Vault](vaultproject.io); +* Terraform deploys should be done by a CI system; +* any manual change in the Kubernetes environment should be recorded in an audit log and committed in the code: + * the Terraform private file above can be applied with continuous integration; + * the intermediate Kubernetes code generated with kustomize could be stored in a CI pipeline and deployed in an auditable way as well (see [Gitops](https://www.weave.works/technologies/gitops/)). diff --git a/quickstart.md b/quickstart.md new file mode 100644 index 0000000..2f18f90 --- /dev/null +++ b/quickstart.md @@ -0,0 +1,267 @@ +Tezos-on-GKE +============ + +[Tezos](http://tezos.gitlab.io) is a [delegated proof of stake](https://bitshares.org/technology/delegated-proof-of-stake-consensus/) blockchain protocol. + +This quickstart guide helps you deploy: + +* a fully featured, [best practices](https://medium.com/tezos/its-a-baker-s-life-for-me-c214971201e1) Tezos baking service on Google Kubernetes Engine, or +* a set of public nodes with a public RPC endpoint ([see documentation](deploy-public-node)). + +The private baking key can be managed two ways: + +* a hot private key stored as a Kubernetes secret for testing purposes; +* support for a SSH-tunneled remote signing setup, for production mainnet bakers. + +Features: + +* high availaibility baking, endorsing and accusing; +* SSH endpoint for remote signing; +* compatible with Tezos mainnet and testnets such as Edonet; +* blockchain snapshot download and import for faster synchronization of the nodes; +* support for two highly available signers; +* deploy everything in just one command; +* metric-based monitoring and alerting with Prometheus. + +Brought to you by MIDL.dev +-------------------------- + +MIDL.dev + +We maintain [Tezos Suite](https://tezos-docs.midl.dev/), a complete baking suite, free for anyone to use. + +We help you deploy and manage a complete Tezos baking operation. [Hire us](https://midl.dev/tezos). + +Architecture +------------ + +This is a Kubernetes private cluster with Tezos nodes located in two Google Cloud zones, in the same region. + +The setup is production hardened: + +* usage of Kubernetes secrets to store sensitive values such as node keys. They are created securely from terraform variables, +* network policies to restrict communication between pods. For example, only sentries can peer with the validator node. + + +Cost +---- + +Deploying will incur Google Compute Engine charges, specifically: + +* virtual machines +* network ingress +* NAT forwarding + +# How to deploy + +*WARNING: Use judgement and care in your network interactions, otherwise loss of funds may occur.* + +## Prerequisites + +1. Download and install [Terraform](https://terraform.io); +2. Download, install, and configure the [Google Cloud SDK](https://cloud.google.com/sdk/); +3. Install the [kubernetes CLI](https://kubernetes.io/docs/tasks/tools/install-kubectl/) (aka `kubectl`). + + +## Authentication + +NOTE: for production deployments, the method below is not recommended. Instead, you should use a Terraform service account following [these instructions](production-readiness). + +1. Using your Google account, activate your Google Cloud access; +2. Login to Google Cloud using `gcloud auth login`; +3. Set up [Google Default Application Credentials](https://cloud.google.com/docs/authentication/production) by issuing the command: + +``` +gcloud auth application-default login +``` + + +## Populate Terraform variables + +All custom values unique to your deployment are set as Terraform variables. You must populate these variables manually before deploying the setup. + +A simple way is to populate a file called `terraform.tfvars`. + +NOTE: `terraform.tfvars` is not recommended for a production deployment. See [production hardening](production-readiness). + +1. Clone the [tezos-on-gke repository](https://github.com/midl-dev/tezos-on-gke); +2. Go to `terraform` folder in the cloned repository. + + +Below is a list of variables you can set. + +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| baking\_nodes | Structured data related to baking, including public key and signer configuration. | `map` | `{}` | no | +| billing\_account | Google Cloud billing account ID. | `string` | `""` | no | +| cluster\_ca\_certificate | Kubernetes cluster certificate. | `string` | `""` | no | +| cluster\_name | Name of the Kubernetes cluster. | `string` | `""` | no | +| experimental\_active\_standby\_mode | Enable exeprimental active-standby mode (https://tezos-docs.midl.dev/active-standby.html). | `bool` | `false` | no | +| history\_mode | History mode of the Tezos nodes (rolling, full or archive). | `string` | `"rolling"` | no | +| kubernetes\_access\_token | Access token for the kubernetes endpoint | `string` | `""` | no | +| kubernetes\_endpoint | Name of the Kubernetes endpoint. | `string` | `""` | no | +| kubernetes\_name\_prefix | Kubernetes name prefix to prepend to all resources (should be short, like xtz). | `string` | `"xtz"` | no | +| kubernetes\_namespace | Kubernetes namespace to deploy the resource into. | `string` | `"tezos"` | no | +| kubernetes\_pool\_name | When Kubernetes cluster has several node pools, specify which ones to deploy the baking setup into. Only effective when deploying on an external cluster with terraform\_no\_cluster\_create | `string` | `"blockchain-pool"` | no | +| monitoring\_slack\_url | Slack API URL to send prometheus alerts to. | `string` | `""` | no | +| node\_locations | Zones in which to create the nodes. | `list` |
[
"us-central1-b",
"us-central1-f"
]
| no | +| node\_storage\_size | Storage size for the nodes, in gibibytes (GiB). | `string` | `"15"` | no | +| org\_id | Google Cloud organization ID. | `string` | `""` | no | +| project | Project ID where Terraform is authenticated to run to create additional projects. If provided, Terraform will great the GKE and Tezos cluster inside this project. If not given, Terraform will generate a new project. | `string` | `""` | no | +| protocols | The list of Tezos protocols currently in use, following the naming convention used in the baker/endorser binary names, for example 007-PsDELPH1. Baking and endorsing daemons will be spun up for every protocol provided in the list, which helps for seamless protocol updates. | `list` |
[
"007-PsDELPH1",
"008-PtEdoTez"
]
| no | +| region | Region in which to create the cluster, or region where the cluster exists. | `string` | `"us-central1"` | no | +| rpc\_public\_hostname | If set, expose the RPC of the public node through a load balancer and create a certificate for the given hostname. | `string` | `""` | no | +| rpc\_subnet\_whitelist | IP address whitelisting for the public RPC. Open to everyone by default. | `list` |
[
"0.0.0.0/0"
]
| no | +| signer\_target\_host\_key | SSH host key for the SSH endpoint the remote signer connects to. If left empty, sshd will generate it but it may change, cutting your access to the remote signers. | `string` | `""` | no | +| snapshot\_url | URL of the snapshot of type rolling to download. | `string` | `"https://mainnet.xtz-shots.io/rolling"` | no | +| terraform\_service\_account\_credentials | Path to terraform service account file, created following the instructions in https://cloud.google.com/community/tutorials/managing-gcp-projects-with-terraform | `string` | `"~/.config/gcloud/application_default_credentials.json"` | no | +| tezos\_network | The Tezos network such as mainnet, edonet, etc. | `string` | `"mainnet"` | no | +| tezos\_version | The Tezos container version for node. Should be hard-coded to a version from https://hub.docker.com/r/tezos/tezos/tags. Not recommended to set to a rolling tag like 'mainnet', because it may break unexpectedly. Example: `v9.2`. | `string` | `"latest-release"` | no | + + +### Baking nodes + +The `baking_nodes` parameter lets you deploy one or several bakers declaratively by passing structured data describing the bakers. + +You may specify: +* a map with one or several baking nodes, and +* for every baking node, one or several baking and endorsing processes. + +The variables needed to spin up the baking or endorsing processes are: + +* `public_baking_key`: the public baking key starting with `edpk` +* `public_baking_key_hash`: the public baking key hash starting with `tz` +* for testnets or test deployments only: set the `insecure_private_baking_key` to the unencrypted private key to be used. + +**Attention!** Leaving a private baking key on a cloud platform is not recommended when funds are present. For production bakers, leave this variable empty and use a remote signer. [See documentation](https://tezos-docs.midl.dev/). + +To generate a public/private keypair, you can use the tezos client: + +``` +tezos-client gen keys insecure-baker +# if you do not have a node running locally, there will be an error, but the key was created anyway +tezos-client show address insecure-baker -S +``` + +Set `public_baking_key_hash` to the value displayed after `Hash:`, `public_baking_key` to the value displayed after `Public key:` and `insecure_private_baking_key` to the value displayed after `Secret key: unencrypted:`. + +If you do not have the tezos client installed locally, you can use the docker Tezos container: + +``` +docker run --name=my-tezos-client tezos/tezos:latest-release tezos-client gen keys insecure-baker +# again, if you do not have a node running locally, there will be an error, but the key was created anyway +docker commit my-tezos-client my-tezos-client +docker run my-tezos-client tezos-client show address insecure-baker -S +``` + +Full example of `baking_nodes` parameter: + +``` +mybaker = { + public_baking_key="edpkup8PaxJYrUcXUEBEufekgqMaodyKLKwHqbtkQVAudiJ7nmrS2o" + public_baking_key_hash="tz1YmsrYxQFJo5nGj4MEaXMPdLrcRf2a5mAU" + insecure_private_baking_key="edsk3cftTNcJnxb7ehCxYeCaKPT7mjycdMxgFisLixrQ9bZuTG2yZK" +} +``` + +If you do not want to bake (for example, if you want to deploy a RPC node only), configure just one node with no baker: + +``` +baking_nodes = { "mynode": {} } +``` + +### Payouts + +Tezos-on-GKE supports the [Tezos Rewards Distributor (TRD)](https://github.com/tezos-reward-distributor-organization/tezos-reward-distributor) running as a cronjob alongside the baker node, sharing the same remote signing infrastructure. + +All details are in the [tezos-suite documentation](trd-payouts). + +### Full example + +Here is a full example `terraform.tfvars` configuration. This private key is provided only as an example, generate your own instead. + +``` +project="" +tezos_network="florencenet" +snapshot_url="https://florencenet.xtz-shots.io/rolling" +baking_nodes = { + mynode = { + mybaker = { + public_baking_key="edpkup8PaxJYrUcXUEBEufekgqMaodyKLKwHqbtkQVAudiJ7nmrS2o" + public_baking_key_hash="tz1YmsrYxQFJo5nGj4MEaXMPdLrcRf2a5mAU" + insecure_private_baking_key="edsk3cftTNcJnxb7ehCxYeCaKPT7mjycdMxgFisLixrQ9bZuTG2yZK" + } + } +} +``` + + +## Deploy! + +1. Run the following: + +``` +terraform init +terraform plan -out plan.out +terraform apply plan.out +``` + +This will take time as it will: +* create a Google Cloud project +* create a Kubernetes cluster +* build the necessary containers +* spin up the baker nodes + +In case of error, run the `plan` and `apply` steps again: + +``` +terraform plan -out plan.out +terraform apply plan.out +``` + +### Connect to the cluster + +Once the command returns, you can verify that the pods are up by running: + +``` +kubectl get pods +``` + +You should see the tezos node. + +Display the log of a public node and observe it sync: + +``` +kubectl logs -f tezos-public-node-0 --tail=10 +``` + +## Use with a remote signer + +It is not recommended to run a production baker with cloud-hosted private keys. + +Follow [our guide](deploy-remote-signer) to configure a hardware remote signer connected to a Ledger. + +When using this mode, you must pass a `baking_nodes` map with the following parameters: + +* `ledger_authorized_path`: the Ledger path associated with the key stored in Ledger device on the remote signer, +* `public_baking_key`: the public key for the key stored in the Ledger device +* `public_baking_key_hash`: the public key hash for the key stored in the Ledger device +* `monitoring_slack_url` and `monitoring_slack_channel`: optional, the Slack channel where to send the signer-specific alerts +* `authorized_signers`: a list of signer specification maps, containing: + * `ssh_pubkey`: the public key of the signer, used for ssh port forwarding, and + * `signer_port`: the port for the signer http endpoint that is being tunneled + * `tunnel_endpoint_port`: the port where the ssh daemon connects to on the load balancer for tunneling traffic + +## Day 2 operations + +[See documentation](day-2-operations) + +## Wrapping up + +To delete everything and terminate all the charges, issue the command: + +``` +terraform destroy +``` + +Alternatively, go to the Google Cloud console and delete the project.