Skip to content
This repository has been archived by the owner on Jun 3, 2020. It is now read-only.

Add full integration test: kms <-> tendermint #49

Open
liamsi opened this issue Sep 12, 2018 · 14 comments
Open

Add full integration test: kms <-> tendermint #49

liamsi opened this issue Sep 12, 2018 · 14 comments
Assignees

Comments

@liamsi
Copy link
Contributor

liamsi commented Sep 12, 2018

As suggested by @jleni:
Instead of manually testing compatibility between the KMS and tendermint we could add some kind of integration test which spins up everything needed to have a tendermint instance talk with a kms instance.

cc @xla @greg-szabo

@xla
Copy link

xla commented Sep 12, 2018

Would this issue be more at home in the tendermint repo? Also do we have a good overview what's left to do from all involved parts to realise this grand integration vision?

@liamsi
Copy link
Contributor Author

liamsi commented Sep 13, 2018

I guess either way (where ever the integration test is spun up), there might be some changes necessary in this repo too. But you are probably right that the tendermint repo should be the central place! I do not have a good overview on what's left to be done. I'd assume one could already spin up tendermint and a kms, establish a secret connection and send a message (like it is done here). Currently, only the "poison pill" message used in the linked test (still unknown to tendermint) should work though.

@thanethomson
Copy link
Contributor

Would it not be best to perhaps consider using an approach like what Pact provides, but for the Tendermint <-> KMS wire protocol? That way we can keep the repos separate and still provide full integration testing without needing a full Tendermint instance running when testing the KMS, and vice-versa when testing from Tendermint's side?

@tarcieri
Copy link
Contributor

I'm personally of the opinion that there is no substitute for a good full-system integration test.

Specifically I'm interested in interoperability testing between KMS and gaiad. They contain (quite deliberately) two different implementations of Amino, serialized messages thereof, and consensus logic.

The failure modes are subtle, for example, I just experienced a signature verification failure error in gaiad arising from a chain ID mismatch with KMS. Something I'd like to accomplish in a non-production environment is observing these sorts of errors in practice and improving the associated error messages to assist with debugging.

Lastly I'd like a way to smoke test all components before they hit staging/production.

@thanethomson
Copy link
Contributor

Makes sense. How would one go about integrating KMS into Tendermint (and/or vice-versa?) during automated testing?

For example, for KMS integration testing, would we download a Tendermint private validator binary from somewhere during testing? (And vice-versa from the Tendermint side)

Or would we go with some kind of FFI-like approach?

@tarcieri
Copy link
Contributor

tarcieri commented Jan 14, 2019

I'm not really familiar with what already exists on the tendermint/gaiad side of things. Is there something that exists which KMS could be added to?

There are all sorts of ways to go on this, from a tool that runs locally for smoke testing changes to a (potentially deterministic) continuous network simulation which is hosted somewhere and used as part of the CI process (e.g. a canary environment which is deployed when changes land on any project).

Personally for now I'd be fine with a "happy path" smoke test which runs locally, and not particularly concerned how it's implemented (e.g. I'd be fine with a shell script or small Go tool)

@ebuchman
Copy link
Contributor

Can we do this via docker containers? Tendermint already uses docker extensively in its integration tests and in what runs on circle. Shouldn't be too hard to include a docker version of the KMS as part of that, but of course it means if we make breaking changes on the tendermint side, tests will fail until they are fixed in the KMS side (which is probably actually what we want, just means more cross-repo co-ordination). As long as we can specify which branch/version of the KMS should be built in the container, this shouldn't be a problem

@jleni
Copy link
Contributor

jleni commented Jan 14, 2019

I was just typing exactly the same thing (docker/vagrant/whatever).
For now, it does not matter if we have to run it locally, but we need to way to smoke test this.

@jleni
Copy link
Contributor

jleni commented Jan 14, 2019

The only reason I say we should explore vagrant/etc and not just docker, is because USB connections are bit painful in docker and we should be testing YubiHSM2/Ledger devices.

@tarcieri
Copy link
Contributor

I'd be fine with something that spun up a bunch of Docker containers locally and ran some tests. That'd also be nice if you ever wanted to turn that into a continuous network simulator.

Regarding fancier tools, Microsoft was supposedly going to open source a Docker-based network simulator (Open Network Emulator), although I can't find anything more recent than this:

https://www.datacenterknowledge.com/microsoft/microsoft-open-source-its-secret-weapon-against-cloud-network-outages

Sounds cool in theory!

That said:

The only reason I say we should explore vagrant/etc and not just docker, is because USB connections are bit painful in docker and we should be testing YubiHSM2/Ledger devices.

Yeah, +1 to that, Docker + USB is a pain in the ass.

@ebuchman
Copy link
Contributor

We don't need to test all the KMS backends when integration testing with Tendermint - just a native default should be fine, so we shouldn't need to worry about USB-over-docker issues. Of course the different backends should be tested in KMS repo (or probably more specific to Signatory repo?).

@thanethomson
Copy link
Contributor

Can we start off for now just doing automated testing with the softsign (ed25519-dalek) provider? Automated testing with a physical HSM USB device would require CI integration with a server with one of those attached (unless someone has one of those? 😁). I also don't have one personally right now, so it's difficult for me to test at the moment.

@liamsi and spoke earlier and we both think there should be integration tests from both sides. Just to be explicit:

From KMS: have a CI testing step from within a Docker image with a prebuilt Tendermint binary. Instead of the tendermint/cmd/tendermint full binary, the binary built into this Docker image should be a kind of "test harness" that effectively runs some integration tests in the Golang equivalent of tests/integration.rs. We could perhaps create a separate application in the tendermint/cmd/ folder, alongside priv_val_server and tendermint. Maybe we could call it priv_val_harness? Any ideas on a good name here?

From Tendermint: have a CI testing step from within a Docker image with a prebuilt KMS binary. This binary should run some kind of test harness similar to tests/integration.rs.

Does this approach sound reasonable for now?

Using Docker images will allow explicit version control for CI from both sides. Coming up with Vagrant scripts based on the Dockerfiles shouldn't be too hard, which I'd imagine would allow for local testing with physical USB HSM devices.

@tarcieri
Copy link
Contributor

Can we start off for now just doing automated testing with the softsign (ed25519-dalek) provider?

That sounds fine to me (and one of the reasons the softsign backend exists).

@thanethomson thanethomson self-assigned this Jan 17, 2019
@thanethomson
Copy link
Contributor

So, I'm coming close to finishing this off now. To try out this solution in the meantime:

# grab the latest test harness code from https://github.com/tendermint/tendermint/pull/3149
> cd $GOPATH/src/github.com/tendermint/tendermint

# "upstream" = https://github.com/tendermint/tendermint
> git fetch upstream pull/3149/head:issues/3149
> git checkout issues/3149

# build the remote_val_harness Docker image locally on your machine
# (by default it builds an image tagged "tendermint/remote_val_harness:latest",
# along with tags for the current Tendermint version)
> make build_remote_val_harness_docker_image

# now go to your KMS source and grab my updates
> cd /path/to/source/for/kms
> git remote add tt [email protected]:thanethomson/kms.git
> git fetch tt issues/154
> git checkout issues/154

# build the KMS build image
> docker build -t tendermint/kms:build-2019-01-21-v0 .

# run the KMS build image
> docker run --rm -it -v /path/to/source/for/kms:/kms \
    tendermint/kms:build-2019-01-21-v0 /bin/bash

########################################################
# Inside tendermint/kms Docker container...
########################################################

# build the source inside the container
> cargo clean && cargo build

# run the test harness script - this executes KMS in the background
# and remote_val_harness in the foreground
# NOTE: This command will be executed in CircleCI here: https://github.com/thanethomson/kms/blob/issues/154/.circleci/config.yml#L53
> TMKMS_BIN=./target/debug/tmkms sh tests/support/run-harness-tests.sh

# Output should look something like the following:

13:38:26 [INFO] tmkms 0.2.3 starting up...
13:38:26 [INFO] [keyring:dalek:test-chain-fgLzft] added validator key cosmosvalconspub1zcjduepq2nvqngqfwhq30jsuv305rq2h35uzzhdt7rqn23959rkwts6f3pys0xsxpr
13:38:26 [INFO] KMS node ID: 48F6E1D553DF65B05C57D2E5FB42B9C23B395D73
13:38:26 [ERROR] [test-chain-fgLzft@tcp://127.0.0.1:61278] I/O error: Connection refused (os error 111)
I[2019-01-21|13:38:26.490] Failed to expand path - using original       keyFile=/remote_val_harness/config/priv_validator_key.json err="user: Current not implemented on linux/amd64"
I[2019-01-21|13:38:26.490] Failed to expand path - using original       stateFile=/remote_val_harness/data/priv_validator_state.json err="user: Current not implemented on linux/amd64"
I[2019-01-21|13:38:26.491] Loading private validator configuration      keyFile=/remote_val_harness/config/priv_validator_key.json stateFile=/remote_val_harness/data/priv_validator_state.json
I[2019-01-21|13:38:26.492] Failed to expand path - using original       genesisFile=/remote_val_harness/config/genesis.json err="user: Current not implemented on linux/amd64"
I[2019-01-21|13:38:26.492] Loading chain ID from genesis file           genesisFile=/remote_val_harness/config/genesis.json
I[2019-01-21|13:38:26.493] Loaded genesis file                          chainID=test-chain-fgLzft
I[2019-01-21|13:38:26.493] Listening at                                 proto=tcp addr=127.0.0.1:61278
I[2019-01-21|13:38:26.493] Resolved TCP address for listener            addr=127.0.0.1:61278
I[2019-01-21|13:38:26.493] Starting test harness
I[2019-01-21|13:38:26.494] Attempting to accept incoming connection     acceptRetries=100
I[2019-01-21|13:38:26.494] Starting SocketVal                           impl=SocketVal
13:38:27 [INFO] KMS node ID: 48F6E1D553DF65B05C57D2E5FB42B9C23B395D73
13:38:27 [INFO] [test-chain-fgLzft@tcp://127.0.0.1:61278] connected to validator successfully
I[2019-01-21|13:38:27.498] TEST: Public key of remote signer
I[2019-01-21|13:38:27.498] Local                                        pubKey=PubKeyEd25519{54D809A00975C117CA1C645F4181578D38215DABF0C13544B428ECE5C3498849}
I[2019-01-21|13:38:27.498] Remote                                       pubKey=PubKeyEd25519{54D809A00975C117CA1C645F4181578D38215DABF0C13544B428ECE5C3498849}
I[2019-01-21|13:38:27.498] TEST: Signing of proposals
I[2019-01-21|13:38:27.501] Successfully validated proposal signature
I[2019-01-21|13:38:27.501] TEST: Signing of votes
I[2019-01-21|13:38:27.501] Testing vote type                            type=1
I[2019-01-21|13:38:27.503] Successfully validated vote signature        type=1
I[2019-01-21|13:38:27.504] Testing vote type                            type=2
I[2019-01-21|13:38:27.511] Successfully validated vote signature        type=2
I[2019-01-21|13:38:27.511] SUCCESS! All tests passed.
I[2019-01-21|13:38:27.511] Attempting to stop remote signer
I[2019-01-21|13:38:27.512] Stopping SocketVal                           impl=SocketVal
13:38:27 [INFO] [test-chain-fgLzft@tcp://127.0.0.1:61278] session closed gracefully
KMS (pid 2434) already stopped, not killing
Harness tests exiting with code 0

Note that this depends on the changes from #152, which was reverted (#154) and the #152 code will only be compatible with Tendermint v0.29.0 by the looks of it. This means we can't merge this to master until Tendermint v0.29.0 comes out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants