diff --git a/.gitignore b/.gitignore index 1032631810c..798f9330144 100644 --- a/.gitignore +++ b/.gitignore @@ -55,3 +55,5 @@ proto/spec/**/*.pb.go *.pdf *.gz *.dvi +# Python virtual environments +.venv diff --git a/docs/qa/README.md b/docs/qa/README.md new file mode 100644 index 00000000000..d322ccb6ca1 --- /dev/null +++ b/docs/qa/README.md @@ -0,0 +1,23 @@ +--- +order: 1 +parent: + title: Tendermint Quality Assurance + description: This is a report on the process followed and results obtained when running v0.34.x on testnets + order: 2 +--- + +# Tendermint Quality Assurance + +This directory keeps track of the process followed by the Tendermint Core team +for Quality Assurance before cutting a release. +This directory is to live in multiple branches. On each release branch, +the contents of this directory reflect the status of the process +at the time the Quality Assurance process was applied for that release. + +File [method](./method.md) keeps track of the process followed to obtain the results +used to decide if a release is passing the Quality Assurance process. +The results obtained in each release are stored in their own directory. +The following releases have undergone the Quality Assurance process: + +* [v0.34.x](./v034/), which was tested just before releasing v0.34.22 +* [v0.37.x](./v037/), with v.34.x acting as a baseline diff --git a/docs/qa/method.md b/docs/qa/method.md new file mode 100644 index 00000000000..cc4f82dfa43 --- /dev/null +++ b/docs/qa/method.md @@ -0,0 +1,214 @@ +--- +order: 1 +title: Method +--- + +# Method + +This document provides a detailed description of the QA process. +It is intended to be used by engineers reproducing the experimental setup for future tests of Tendermint. + +The (first iteration of the) QA process as described [in the RELEASES.md document][releases] +was applied to version v0.34.x in order to have a set of results acting as benchmarking baseline. +This baseline is then compared with results obtained in later versions. + +Out of the testnet-based test cases described in [the releases document][releases] we focused on two of them: +_200 Node Test_, and _Rotating Nodes Test_. + +[releases]: https://github.com/tendermint/tendermint/blob/v0.37.x/RELEASES.md#large-scale-testnets + +## Software Dependencies + +### Infrastructure Requirements to Run the Tests + +* An account at Digital Ocean (DO), with a high droplet limit (>202) +* The machine to orchestrate the tests should have the following installed: + * A clone of the [testnet repository][testnet-repo] + * This repository contains all the scripts mentioned in the reminder of this section + * [Digital Ocean CLI][doctl] + * [Terraform CLI][Terraform] + * [Ansible CLI][Ansible] + +[testnet-repo]: https://github.com/interchainio/tendermint-testnet +[Ansible]: https://docs.ansible.com/ansible/latest/index.html +[Terraform]: https://www.terraform.io/docs +[doctl]: https://docs.digitalocean.com/reference/doctl/how-to/install/ + +### Requirements for Result Extraction + +* Matlab or Octave +* [Prometheus][prometheus] server installed +* blockstore DB of one of the full nodes in the testnet +* Prometheus DB + +[prometheus]: https://prometheus.io/ + +## 200 Node Testnet + +### Running the test + +This section explains how the tests were carried out for reproducibility purposes. + +1. [If you haven't done it before] + Follow steps 1-4 of the `README.md` at the top of the testnet repository to configure Terraform, and `doctl`. +2. Copy file `testnets/testnet200.toml` onto `testnet.toml` (do NOT commit this change) +3. Set the variable `VERSION_TAG` in the `Makefile` to the git hash that is to be tested. +4. Follow steps 5-10 of the `README.md` to configure and start the 200 node testnet + * WARNING: Do NOT forget to run `make terraform-destroy` as soon as you are done with the tests (see step 9) +5. As a sanity check, connect to the Prometheus node's web interface and check the graph for the `tendermint_consensus_height` metric. + All nodes should be increasing their heights. +6. `ssh` into the `testnet-load-runner`, then copy script `script/200-node-loadscript.sh` and run it from the load runner node. + * Before running it, you need to edit the script to provide the IP address of a full node. + This node will receive all transactions from the load runner node. + * This script will take about 40 mins to run + * It is running 90-seconds-long experiments in a loop with different loads +7. Run `make retrieve-data` to gather all relevant data from the testnet into the orchestrating machine +8. Verify that the data was collected without errors + * at least one blockstore DB for a Tendermint validator + * the Prometheus database from the Prometheus node + * for extra care, you can run `zip -T` on the `prometheus.zip` file and (one of) the `blockstore.db.zip` file(s) +9. **Run `make terraform-destroy`** + * Don't forget to type `yes`! Otherwise you're in trouble. + +### Result Extraction + +The method for extracting the results described here is highly manual (and exploratory) at this stage. +The Core team should improve it at every iteration to increase the amount of automation. + +#### Steps + +1. Unzip the blockstore into a directory +2. Extract the latency report and the raw latencies for all the experiments. Run these commands from the directory containing the blockstore + * `go run github.com/tendermint/tendermint/test/loadtime/cmd/report@3ec6e424d --database-type goleveldb --data-dir ./ > results/report.txt` + * `go run github.com/tendermint/tendermint/test/loadtime/cmd/report@3ec6e424d --database-type goleveldb --data-dir ./ --csv results/raw.csv` +3. File `report.txt` contains an unordered list of experiments with varying concurrent connections and transaction rate + * Create files `report01.txt`, `report02.txt`, `report04.txt` and, for each experiment in file `report.txt`, + copy its related lines to the filename that matches the number of connections. + * Sort the experiments in `report01.txt` in ascending tx rate order. Likewise for `report02.txt` and `report04.txt`. +4. Generate file `report_tabbed.txt` by showing the contents `report01.txt`, `report02.txt`, `report04.txt` side by side + * This effectively creates a table where rows are a particular tx rate and columns are a particular number of websocket connections. +5. Extract the raw latencies from file `raw.csv` using the following bash loop. This creates a `.csv` file and a `.dat` file per experiment. + The format of the `.dat` files is amenable to loading them as matrices in Octave + + ```bash + uuids=($(cat report01.txt report02.txt report04.txt | grep '^Experiment ID: ' | awk '{ print $3 }')) + c=1 + for i in 01 02 04; do + for j in 0025 0050 0100 0200; do + echo $i $j $c "${uuids[$c]}" + filename=c${i}_r${j} + grep ${uuids[$c]} raw.csv > ${filename}.csv + cat ${filename}.csv | tr , ' ' | awk '{ print $2, $3 }' > ${filename}.dat + c=$(expr $c + 1) + done + done + ``` + +6. Enter Octave +7. Load all `.dat` files generated in step 5 into matrices using this Octave code snippet + + ```octave + conns = { "01"; "02"; "04" }; + rates = { "0025"; "0050"; "0100"; "0200" }; + for i = 1:length(conns) + for j = 1:length(rates) + filename = strcat("c", conns{i}, "_r", rates{j}, ".dat"); + load("-ascii", filename); + endfor + endfor + ``` + +8. Set variable release to the current release undergoing QA + + ```octave + release = "v0.34.x"; + ``` + +9. Generate a plot with all (or some) experiments, where the X axis is the experiment time, + and the y axis is the latency of transactions. + The following snippet plots all experiments. + + ```octave + legends = {}; + hold off; + for i = 1:length(conns) + for j = 1:length(rates) + data_name = strcat("c", conns{i}, "_r", rates{j}); + l = strcat("c=", conns{i}, " r=", rates{j}); + m = eval(data_name); plot((m(:,1) - min(m(:,1))) / 1e+9, m(:,2) / 1e+9, "."); + hold on; + legends(1, end+1) = l; + endfor + endfor + legend(legends, "location", "northeastoutside"); + xlabel("experiment time (s)"); + ylabel("latency (s)"); + t = sprintf("200-node testnet - %s", release); + title(t); + ``` + +10. Consider adjusting the axis, in case you want to compare your results to the baseline, for instance + + ```octave + axis([0, 100, 0, 30], "tic"); + ``` + +11. Use Octave's GUI menu to save the plot (e.g. as `.png`) + +12. Repeat steps 9 and 10 to obtain as many plots as deemed necessary. + +13. To generate a latency vs throughput plot, using the raw CSV file generated + in step 2, follow the instructions for the [`latency_throughput.py`] script. + +[`latency_throughput.py`]: ../../scripts/qa/reporting/README.md + +#### Extracting Prometheus Metrics + +1. Stop the prometheus server if it is running as a service (e.g. a `systemd` unit). +2. Unzip the prometheus database retrieved from the testnet, and move it to replace the + local prometheus database. +3. Start the prometheus server and make sure no error logs appear at start up. +4. Introduce the metrics you want to gather or plot. + +## Rotating Node Testnet + +### Running the test + +This section explains how the tests were carried out for reproducibility purposes. + +1. [If you haven't done it before] + Follow steps 1-4 of the `README.md` at the top of the testnet repository to configure Terraform, and `doctl`. +2. Copy file `testnet_rotating.toml` onto `testnet.toml` (do NOT commit this change) +3. Set variable `VERSION_TAG` to the git hash that is to be tested. +4. Run `make terraform-apply EPHEMERAL_SIZE=25` + * WARNING: Do NOT forget to run `make terraform-destroy` as soon as you are done with the tests +5. Follow steps 6-10 of the `README.md` to configure and start the "stable" part of the rotating node testnet +6. As a sanity check, connect to the Prometheus node's web interface and check the graph for the `tendermint_consensus_height` metric. + All nodes should be increasing their heights. +7. On a different shell, + * run `make runload ROTATE_CONNECTIONS=X ROTATE_TX_RATE=Y` + * `X` and `Y` should reflect a load below the saturation point (see, e.g., + [this paragraph](./v034/README.md#finding-the-saturation-point) for further info) +8. Run `make rotate` to start the script that creates the ephemeral nodes, and kills them when they are caught up. + * WARNING: If you run this command from your laptop, the laptop needs to be up and connected for full length + of the experiment. +9. When the height of the chain reaches 3000, stop the `make rotate` script +10. When the rotate script has made two iterations (i.e., all ephemeral nodes have caught up twice) + after height 3000 was reached, stop `make rotate` +11. Run `make retrieve-data` to gather all relevant data from the testnet into the orchestrating machine +12. Verify that the data was collected without errors + * at least one blockstore DB for a Tendermint validator + * the Prometheus database from the Prometheus node + * for extra care, you can run `zip -T` on the `prometheus.zip` file and (one of) the `blockstore.db.zip` file(s) +13. **Run `make terraform-destroy`** + +Steps 8 to 10 are highly manual at the moment and will be improved in next iterations. + +### Result Extraction + +In order to obtain a latency plot, follow the instructions above for the 200 node experiment, but: + +* The `results.txt` file contains only one experiment +* Therefore, no need for any `for` loops + +As for prometheus, the same method as for the 200 node experiment can be applied. diff --git a/docs/qa/v034/README.md b/docs/qa/v034/README.md new file mode 100644 index 00000000000..b07b1029124 --- /dev/null +++ b/docs/qa/v034/README.md @@ -0,0 +1,278 @@ +--- +order: 1 +parent: + title: Tendermint Quality Assurance Results for v0.34.x + description: This is a report on the results obtained when running v0.34.x on testnets + order: 2 +--- + +# v0.34.x + +## 200 Node Testnet + +### Finding the Saturation Point + +The first goal when examining the results of the tests is identifying the saturation point. +The saturation point is a setup with a transaction load big enough to prevent the testnet +from being stable: the load runner tries to produce slightly more transactions than can +be processed by the testnet. + +The following table summarizes the results for v0.34.x, for the different experiments +(extracted from file [`v034_report_tabbed.txt`](./img/v034_report_tabbed.txt)). + +The X axis of this table is `c`, the number of connections created by the load runner process to the target node. +The Y axis of this table is `r`, the rate or number of transactions issued per second. + +| | c=1 | c=2 | c=4 | +| :--- | ----: | ----: | ----: | +| r=25 | 2225 | 4450 | 8900 | +| r=50 | 4450 | 8900 | 17800 | +| r=100 | 8900 | 17800 | 35600 | +| r=200 | 17800 | 35600 | 38660 | + +The table shows the number of 1024-byte-long transactions that were produced by the load runner, +and processed by Tendermint, during the 90 seconds of the experiment's duration. +Each cell in the table refers to an experiment with a particular number of websocket connections (`c`) +to a chosen validator, and the number of transactions per second that the load runner +tries to produce (`r`). Note that the overall load that the tool attempts to generate is $c \cdot r$. + +We can see that the saturation point is beyond the diagonal that spans cells + +* `r=200,c=2` +* `r=100,c=4` + +given that the total transactions should be close to the product of the rate, the number of connections, +and the experiment time (89 seconds, since the last batch never gets sent). + +All experiments below the saturation diagonal (`r=200,c=4`) have in common that the total +number of transactions processed is noticeably less than the product $c \cdot r \cdot 89$, +which is the expected number of transactions when the system is able to deal well with the +load. +With `r=200,c=4`, we obtained 38660 whereas the theoretical number of transactions should +have been $200 \cdot 4 \cdot 89 = 71200$. + +At this point, we chose an experiment at the limit of the saturation diagonal, +in order to further study the performance of this release. +**The chosen experiment is `r=200,c=2`**. + +This is a plot of the CPU load (average over 1 minute, as output by `top`) of the load runner for `r=200,c=2`, +where we can see that the load stays close to 0 most of the time. + +![load-load-runner](./img/v034_r200c2_load-runner.png) + +### Examining latencies + +The method described [here](../method.md) allows us to plot the latencies of transactions +for all experiments. + +![all-latencies](./img/v034_200node_latencies.png) + +As we can see, even the experiments beyond the saturation diagonal managed to keep +transaction latency stable (i.e. not constantly increasing). +Our interpretation for this is that contention within Tendermint was propagated, +via the websockets, to the load runner, +hence the load runner could not produce the target load, but a fraction of it. + +Further examination of the Prometheus data (see below), showed that the mempool contained many transactions +at steady state, but did not grow much without quickly returning to this steady state. This demonstrates +that the transactions were able to be processed by the Tendermint network at least as quickly as they +were submitted to the mempool. Finally, the test script made sure that, at the end of an experiment, the +mempool was empty so that all transactions submitted to the chain were processed. + +Finally, the number of points present in the plot appears to be much less than expected given the +number of transactions in each experiment, particularly close to or above the saturation diagonal. +This is a visual effect of the plot; what appear to be points in the plot are actually potentially huge +clusters of points. To corroborate this, we have zoomed in the plot above by setting (carefully chosen) +tiny axis intervals. The cluster shown below looks like a single point in the plot above. + +![all-latencies-zoomed](./img/v034_200node_latencies_zoomed.png) + +The plot of latencies can we used as a baseline to compare with other releases. + +The following plot summarizes average latencies versus overall throughputs +across different numbers of WebSocket connections to the node into which +transactions are being loaded. + +![latency-vs-throughput](./img/v034_latency_throughput.png) + +### Prometheus Metrics on the Chosen Experiment + +As mentioned [above](#finding-the-saturation-point), the chosen experiment is `r=200,c=2`. +This section further examines key metrics for this experiment extracted from Prometheus data. + +#### Mempool Size + +The mempool size, a count of the number of transactions in the mempool, was shown to be stable and homogeneous +at all full nodes. It did not exhibit any unconstrained growth. +The plot below shows the evolution over time of the cumulative number of transactions inside all full nodes' mempools +at a given time. +The two spikes that can be observed correspond to a period where consensus instances proceeded beyond the initial round +at some nodes. + +![mempool-cumulative](./img/v034_r200c2_mempool_size.png) + +The plot below shows evolution of the average over all full nodes, which oscillates between 1500 and 2000 +outstanding transactions. + +![mempool-avg](./img/v034_r200c2_mempool_size_avg.png) + +The peaks observed coincide with the moments when some nodes proceeded beyond the initial round of consensus (see below). + +#### Peers + +The number of peers was stable at all nodes. +It was higher for the seed nodes (around 140) than for the rest (between 21 and 74). +The fact that non-seed nodes reach more than 50 peers is due to #9548. + +![peers](./img/v034_r200c2_peers.png) + +#### Consensus Rounds per Height + +Most heights took just one round, but some nodes needed to advance to round 1 at some point. + +![rounds](./img/v034_r200c2_rounds.png) + +#### Blocks Produced per Minute, Transactions Processed per Minute + +The blocks produced per minute are the slope of this plot. + +![heights](./img/v034_r200c2_heights.png) + +Over a period of 2 minutes, the height goes from 530 to 569. +This results in an average of 19.5 blocks produced per minute. + +The transactions processed per minute are the slope of this plot. + +![total-txs](./img/v034_r200c2_total-txs.png) + +Over a period of 2 minutes, the total goes from 64525 to 100125 transactions, +resulting in 17800 transactions per minute. However, we can see in the plot that +all transactions in the load are processed long before the two minutes. +If we adjust the time window when transactions are processed (approx. 105 seconds), +we obtain 20343 transactions per minute. + +#### Memory Resident Set Size + +Resident Set Size of all monitored processes is plotted below. + +![rss](./img/v034_r200c2_rss.png) + +The average over all processes oscillates around 1.2 GiB and does not demonstrate unconstrained growth. + +![rss-avg](./img/v034_r200c2_rss_avg.png) + +#### CPU utilization + +The best metric from Prometheus to gauge CPU utilization in a Unix machine is `load1`, +as it usually appears in the +[output of `top`](https://www.digitalocean.com/community/tutorials/load-average-in-linux). + +![load1](./img/v034_r200c2_load1.png) + +It is contained in most cases below 5, which is generally considered acceptable load. + +### Test Result + +**Result: N/A** (v0.34.x is the baseline) + +Date: 2022-10-14 + +Version: 3ec6e424d6ae4c96867c2dcf8310572156068bb6 + +## Rotating Node Testnet + +For this testnet, we will use a load that can safely be considered below the saturation +point for the size of this testnet (between 13 and 38 full nodes): `c=4,r=800`. + +N.B.: The version of Tendermint used for these tests is affected by #9539. +However, the reduced load that reaches the mempools is orthogonal to functionality +we are focusing on here. + +### Latencies + +The plot of all latencies can be seen in the following plot. + +![rotating-all-latencies](./img/v034_rotating_latencies.png) + +We can observe there are some very high latencies, towards the end of the test. +Upon suspicion that they are duplicate transactions, we examined the latencies +raw file and discovered there are more than 100K duplicate transactions. + +The following plot shows the latencies file where all duplicate transactions have +been removed, i.e., only the first occurrence of a duplicate transaction is kept. + +![rotating-all-latencies-uniq](./img/v034_rotating_latencies_uniq.png) + +This problem, existing in `v0.34.x`, will need to be addressed, perhaps in the same way +we addressed it when running the 200 node test with high loads: increasing the `cache_size` +configuration parameter. + +### Prometheus Metrics + +The set of metrics shown here are less than for the 200 node experiment. +We are only interested in those for which the catch-up process (blocksync) may have an impact. + +#### Blocks and Transactions per minute + +Just as shown for the 200 node test, the blocks produced per minute are the gradient of this plot. + +![rotating-heights](./img/v034_rotating_heights.png) + +Over a period of 5229 seconds, the height goes from 2 to 3638. +This results in an average of 41 blocks produced per minute. + +The following plot shows only the heights reported by ephemeral nodes +(which are also included in the plot above). Note that the _height_ metric +is only showed _once the node has switched to consensus_, hence the gaps +when nodes are killed, wiped out, started from scratch, and catching up. + +![rotating-heights-ephe](./img/v034_rotating_heights_ephe.png) + +The transactions processed per minute are the gradient of this plot. + +![rotating-total-txs](./img/v034_rotating_total-txs.png) + +The small lines we see periodically close to `y=0` are the transactions that +ephemeral nodes start processing when they are caught up. + +Over a period of 5229 minutes, the total goes from 0 to 387697 transactions, +resulting in 4449 transactions per minute. We can see some abrupt changes in +the plot's gradient. This will need to be investigated. + +#### Peers + +The plot below shows the evolution in peers throughout the experiment. +The periodic changes observed are due to the ephemeral nodes being stopped, +wiped out, and recreated. + +![rotating-peers](./img/v034_rotating_peers.png) + +The validators' plots are concentrated at the higher part of the graph, whereas the ephemeral nodes +are mostly at the lower part. + +#### Memory Resident Set Size + +The average Resident Set Size (RSS) over all processes seems stable, and slightly growing toward the end. +This might be related to the increased in transaction load observed above. + +![rotating-rss-avg](./img/v034_rotating_rss_avg.png) + +The memory taken by the validators and the ephemeral nodes (when they are up) is comparable. + +#### CPU utilization + +The plot shows metric `load1` for all nodes. + +![rotating-load1](./img/v034_rotating_load1.png) + +It is contained under 5 most of the time, which is considered normal load. +The purple line, which follows a different pattern is the validator receiving all +transactions, via RPC, from the load runner process. + +### Test Result + +**Result: N/A** + +Date: 2022-10-10 + +Version: a28c987f5a604ff66b515dd415270063e6fb069d diff --git a/docs/qa/v034/img/v034_200node_latencies.png b/docs/qa/v034/img/v034_200node_latencies.png new file mode 100644 index 00000000000..afd1060cafe Binary files /dev/null and b/docs/qa/v034/img/v034_200node_latencies.png differ diff --git a/docs/qa/v034/img/v034_200node_latencies_zoomed.png b/docs/qa/v034/img/v034_200node_latencies_zoomed.png new file mode 100644 index 00000000000..1ff93644220 Binary files /dev/null and b/docs/qa/v034/img/v034_200node_latencies_zoomed.png differ diff --git a/docs/qa/v034/img/v034_latency_throughput.png b/docs/qa/v034/img/v034_latency_throughput.png new file mode 100644 index 00000000000..3674fe47b40 Binary files /dev/null and b/docs/qa/v034/img/v034_latency_throughput.png differ diff --git a/docs/qa/v034/img/v034_r200c2_heights.png b/docs/qa/v034/img/v034_r200c2_heights.png new file mode 100644 index 00000000000..11f3bba432a Binary files /dev/null and b/docs/qa/v034/img/v034_r200c2_heights.png differ diff --git a/docs/qa/v034/img/v034_r200c2_load-runner.png b/docs/qa/v034/img/v034_r200c2_load-runner.png new file mode 100644 index 00000000000..70211b0d214 Binary files /dev/null and b/docs/qa/v034/img/v034_r200c2_load-runner.png differ diff --git a/docs/qa/v034/img/v034_r200c2_load1.png b/docs/qa/v034/img/v034_r200c2_load1.png new file mode 100644 index 00000000000..11012844dc6 Binary files /dev/null and b/docs/qa/v034/img/v034_r200c2_load1.png differ diff --git a/docs/qa/v034/img/v034_r200c2_mempool_size.png b/docs/qa/v034/img/v034_r200c2_mempool_size.png new file mode 100644 index 00000000000..c5d690200a0 Binary files /dev/null and b/docs/qa/v034/img/v034_r200c2_mempool_size.png differ diff --git a/docs/qa/v034/img/v034_r200c2_mempool_size_avg.png b/docs/qa/v034/img/v034_r200c2_mempool_size_avg.png new file mode 100644 index 00000000000..bda399fe5dc Binary files /dev/null and b/docs/qa/v034/img/v034_r200c2_mempool_size_avg.png differ diff --git a/docs/qa/v034/img/v034_r200c2_peers.png b/docs/qa/v034/img/v034_r200c2_peers.png new file mode 100644 index 00000000000..a0aea7ada39 Binary files /dev/null and b/docs/qa/v034/img/v034_r200c2_peers.png differ diff --git a/docs/qa/v034/img/v034_r200c2_rounds.png b/docs/qa/v034/img/v034_r200c2_rounds.png new file mode 100644 index 00000000000..215be100de1 Binary files /dev/null and b/docs/qa/v034/img/v034_r200c2_rounds.png differ diff --git a/docs/qa/v034/img/v034_r200c2_rss.png b/docs/qa/v034/img/v034_r200c2_rss.png new file mode 100644 index 00000000000..6d14dced0b4 Binary files /dev/null and b/docs/qa/v034/img/v034_r200c2_rss.png differ diff --git a/docs/qa/v034/img/v034_r200c2_rss_avg.png b/docs/qa/v034/img/v034_r200c2_rss_avg.png new file mode 100644 index 00000000000..8dec67da29e Binary files /dev/null and b/docs/qa/v034/img/v034_r200c2_rss_avg.png differ diff --git a/docs/qa/v034/img/v034_r200c2_total-txs.png b/docs/qa/v034/img/v034_r200c2_total-txs.png new file mode 100644 index 00000000000..177d5f1c318 Binary files /dev/null and b/docs/qa/v034/img/v034_r200c2_total-txs.png differ diff --git a/docs/qa/v034/img/v034_report_tabbed.txt b/docs/qa/v034/img/v034_report_tabbed.txt new file mode 100644 index 00000000000..25149547437 --- /dev/null +++ b/docs/qa/v034/img/v034_report_tabbed.txt @@ -0,0 +1,52 @@ +Experiment ID: 3d5cf4ef-1a1a-4b46-aa2d-da5643d2e81e │Experiment ID: 80e472ec-13a1-4772-a827-3b0c907fb51d │Experiment ID: 07aca6cf-c5a4-4696-988f-e3270fc6333b + │ │ + Connections: 1 │ Connections: 2 │ Connections: 4 + Rate: 25 │ Rate: 25 │ Rate: 25 + Size: 1024 │ Size: 1024 │ Size: 1024 + │ │ + Total Valid Tx: 2225 │ Total Valid Tx: 4450 │ Total Valid Tx: 8900 + Total Negative Latencies: 0 │ Total Negative Latencies: 0 │ Total Negative Latencies: 0 + Minimum Latency: 599.404362ms │ Minimum Latency: 448.145181ms │ Minimum Latency: 412.485729ms + Maximum Latency: 3.539686885s │ Maximum Latency: 3.237392049s │ Maximum Latency: 12.026665368s + Average Latency: 1.441485349s │ Average Latency: 1.441267946s │ Average Latency: 2.150192457s + Standard Deviation: 541.049869ms │ Standard Deviation: 525.040007ms │ Standard Deviation: 2.233852478s + │ │ +Experiment ID: 953dc544-dd40-40e8-8712-20c34c3ce45e │Experiment ID: d31fc258-16e7-45cd-9dc8-13ab87bc0b0a │Experiment ID: 15d90a7e-b941-42f4-b411-2f15f857739e + │ │ + Connections: 1 │ Connections: 2 │ Connections: 4 + Rate: 50 │ Rate: 50 │ Rate: 50 + Size: 1024 │ Size: 1024 │ Size: 1024 + │ │ + Total Valid Tx: 4450 │ Total Valid Tx: 8900 │ Total Valid Tx: 17800 + Total Negative Latencies: 0 │ Total Negative Latencies: 0 │ Total Negative Latencies: 0 + Minimum Latency: 482.046942ms │ Minimum Latency: 435.458913ms │ Minimum Latency: 510.746448ms + Maximum Latency: 3.761483455s │ Maximum Latency: 7.175583584s │ Maximum Latency: 6.551497882s + Average Latency: 1.450408183s │ Average Latency: 1.681673116s │ Average Latency: 1.738083875s + Standard Deviation: 587.560056ms │ Standard Deviation: 1.147902047s │ Standard Deviation: 943.46522ms + │ │ +Experiment ID: 9a0b9980-9ce6-4db5-a80a-65ca70294b87 │Experiment ID: df8fa4f4-80af-4ded-8a28-356d15018b43 │Experiment ID: d0e41c2c-89c0-4f38-8e34-ca07adae593a + │ │ + Connections: 1 │ Connections: 2 │ Connections: 4 + Rate: 100 │ Rate: 100 │ Rate: 100 + Size: 1024 │ Size: 1024 │ Size: 1024 + │ │ + Total Valid Tx: 8900 │ Total Valid Tx: 17800 │ Total Valid Tx: 35600 + Total Negative Latencies: 0 │ Total Negative Latencies: 0 │ Total Negative Latencies: 0 + Minimum Latency: 477.417219ms │ Minimum Latency: 564.29247ms │ Minimum Latency: 840.71089ms + Maximum Latency: 6.63744785s │ Maximum Latency: 6.988553219s │ Maximum Latency: 9.555312398s + Average Latency: 1.561216103s │ Average Latency: 1.76419063s │ Average Latency: 3.200941683s + Standard Deviation: 1.011333552s │ Standard Deviation: 1.068459423s │ Standard Deviation: 1.732346601s + │ │ +Experiment ID: 493df3ee-4a36-4bce-80f8-6d65da66beda │Experiment ID: 13060525-f04f-46f6-8ade-286684b2fe50 │Experiment ID: 1777cbd2-8c96-42e4-9ec7-9b21f2225e4d + │ │ + Connections: 1 │ Connections: 2 │ Connections: 4 + Rate: 200 │ Rate: 200 │ Rate: 200 + Size: 1024 │ Size: 1024 │ Size: 1024 + │ │ + Total Valid Tx: 17800 │ Total Valid Tx: 35600 │ Total Valid Tx: 38660 + Total Negative Latencies: 0 │ Total Negative Latencies: 0 │ Total Negative Latencies: 0 + Minimum Latency: 493.705261ms │ Minimum Latency: 955.090573ms │ Minimum Latency: 1.9485821s + Maximum Latency: 7.440921872s │ Maximum Latency: 10.086673491s │ Maximum Latency: 17.73103976s + Average Latency: 1.875510582s │ Average Latency: 3.438130099s │ Average Latency: 8.143862237s + Standard Deviation: 1.304336995s │ Standard Deviation: 1.966391574s │ Standard Deviation: 3.943140002s + diff --git a/docs/qa/v034/img/v034_rotating_heights.png b/docs/qa/v034/img/v034_rotating_heights.png new file mode 100644 index 00000000000..47913c282f8 Binary files /dev/null and b/docs/qa/v034/img/v034_rotating_heights.png differ diff --git a/docs/qa/v034/img/v034_rotating_heights_ephe.png b/docs/qa/v034/img/v034_rotating_heights_ephe.png new file mode 100644 index 00000000000..981b93d6c45 Binary files /dev/null and b/docs/qa/v034/img/v034_rotating_heights_ephe.png differ diff --git a/docs/qa/v034/img/v034_rotating_latencies.png b/docs/qa/v034/img/v034_rotating_latencies.png new file mode 100644 index 00000000000..f0a54ed5b60 Binary files /dev/null and b/docs/qa/v034/img/v034_rotating_latencies.png differ diff --git a/docs/qa/v034/img/v034_rotating_latencies_uniq.png b/docs/qa/v034/img/v034_rotating_latencies_uniq.png new file mode 100644 index 00000000000..e5d694a16e4 Binary files /dev/null and b/docs/qa/v034/img/v034_rotating_latencies_uniq.png differ diff --git a/docs/qa/v034/img/v034_rotating_load1.png b/docs/qa/v034/img/v034_rotating_load1.png new file mode 100644 index 00000000000..e9c385b85eb Binary files /dev/null and b/docs/qa/v034/img/v034_rotating_load1.png differ diff --git a/docs/qa/v034/img/v034_rotating_peers.png b/docs/qa/v034/img/v034_rotating_peers.png new file mode 100644 index 00000000000..ab5c8732d3d Binary files /dev/null and b/docs/qa/v034/img/v034_rotating_peers.png differ diff --git a/docs/qa/v034/img/v034_rotating_rss_avg.png b/docs/qa/v034/img/v034_rotating_rss_avg.png new file mode 100644 index 00000000000..9a4167320cd Binary files /dev/null and b/docs/qa/v034/img/v034_rotating_rss_avg.png differ diff --git a/docs/qa/v034/img/v034_rotating_total-txs.png b/docs/qa/v034/img/v034_rotating_total-txs.png new file mode 100644 index 00000000000..1ce5f47e9b2 Binary files /dev/null and b/docs/qa/v034/img/v034_rotating_total-txs.png differ diff --git a/docs/qa/v037/README.md b/docs/qa/v037/README.md new file mode 100644 index 00000000000..198f41f132e --- /dev/null +++ b/docs/qa/v037/README.md @@ -0,0 +1,323 @@ +--- +order: 1 +parent: + title: Tendermint Quality Assurance Results for v0.37.x + description: This is a report on the results obtained when running v0.37.x on testnets + order: 2 +--- + +# v0.37.x + +## Issues discovered + +During this iteration of the QA process, the following issues were found: + +* (critical, fixed) [\#9533] - This bug caused full nodes to sometimes get stuck + when blocksyncing, requiring a manual restart to unblock them. Importantly, + this bug was also present in v0.34.x and the fix was also backported in + [\#9534]. +* (critical, fixed) [\#9539] - `loadtime` is very likely to include more than + one "=" character in transactions, with is rejected by the e2e application. +* (non-critical, not fixed) [\#9548] - Full nodes can go over 50 connected + peers, which is not intended by the default configuration. +* (non-critical, not fixed) [\#9537] - With the default mempool cache setting, + duplicated transactions are not rejected when gossipped and eventually flood + all mempools. The 200 node testnets were thus run with a value of 200000 (as + opposed to the default 10000) + +## 200 Node Testnet + +### Finding the Saturation Point + +The first goal is to identify the saturation point and compare it with the baseline (v0.34.x). +For further details, see [this paragraph](../v034/README.md#finding-the-saturation-point) +in the baseline version. + +The following table summarizes the results for v0.37.x, for the different experiments +(extracted from file [`v037_report_tabbed.txt`](./img/v037_report_tabbed.txt)). + +The X axis of this table is `c`, the number of connections created by the load runner process to the target node. +The Y axis of this table is `r`, the rate or number of transactions issued per second. + +| | c=1 | c=2 | c=4 | +| :--- | ----: | ----: | ----: | +| r=25 | 2225 | 4450 | 8900 | +| r=50 | 4450 | 8900 | 17800 | +| r=100 | 8900 | 17800 | 35600 | +| r=200 | 17800 | 35600 | 38660 | + +For comparison, this is the table with the baseline version. + +| | c=1 | c=2 | c=4 | +| :--- | ----: | ----: | ----: | +| r=25 | 2225 | 4450 | 8900 | +| r=50 | 4450 | 8900 | 17800 | +| r=100 | 8900 | 17800 | 35400 | +| r=200 | 17800 | 35600 | 37358 | + +The saturation point is beyond the diagonal: + +* `r=200,c=2` +* `r=100,c=4` + +which is at the same place as the baseline. For more details on the saturation point, see +[this paragraph](../v034/README.md#finding-the-saturation-point) in the baseline version. + +The experiment chosen to examine Prometheus metrics is the same as in the baseline: +**`r=200,c=2`**. + +The load runner's CPU load was negligible (near 0) when running `r=200,c=2`. + +### Examining latencies + +The method described [here](../method.md) allows us to plot the latencies of transactions +for all experiments. + +![all-latencies](./img/v037_200node_latencies.png) + +The data seen in the plot is similar to that of the baseline. + +![all-latencies-bl](../v034/img/v034_200node_latencies.png) + +Therefore, for further details on these plots, +see [this paragraph](../v034/README.md#examining-latencies) in the baseline version. + +The following plot summarizes average latencies versus overall throughputs +across different numbers of WebSocket connections to the node into which +transactions are being loaded. + +![latency-vs-throughput](./img/v037_latency_throughput.png) + +This is similar to that of the baseline plot: + +![latency-vs-throughput-bl](../v034/img/v034_latency_throughput.png) + +### Prometheus Metrics on the Chosen Experiment + +As mentioned [above](#finding-the-saturation-point), the chosen experiment is `r=200,c=2`. +This section further examines key metrics for this experiment extracted from Prometheus data. + +#### Mempool Size + +The mempool size, a count of the number of transactions in the mempool, was shown to be stable and homogeneous +at all full nodes. It did not exhibit any unconstrained growth. +The plot below shows the evolution over time of the cumulative number of transactions inside all full nodes' mempools +at a given time. + +![mempool-cumulative](./img/v037_r200c2_mempool_size.png) + +The plot below shows evolution of the average over all full nodes, which oscillate between 1500 and 2000 outstanding transactions. + +![mempool-avg](./img/v037_r200c2_mempool_size_avg.png) + +The peaks observed coincide with the moments when some nodes reached round 1 of consensus (see below). + +**These plots yield similar results to the baseline**: + +![mempool-cumulative-bl](../v034/img/v034_r200c2_mempool_size.png) + +![mempool-avg-bl](../v034/img/v034_r200c2_mempool_size_avg.png) + +#### Peers + +The number of peers was stable at all nodes. +It was higher for the seed nodes (around 140) than for the rest (between 16 and 78). + +![peers](./img/v037_r200c2_peers.png) + +Just as in the baseline, the fact that non-seed nodes reach more than 50 peers is due to #9548. + +**This plot yields similar results to the baseline**: + +![peers-bl](../v034/img/v034_r200c2_peers.png) + +#### Consensus Rounds per Height + +Most heights took just one round, but some nodes needed to advance to round 1 at some point. + +![rounds](./img/v037_r200c2_rounds.png) + +**This plot yields slightly better results than the baseline**: + +![rounds-bl](../v034/img/v034_r200c2_rounds.png) + +#### Blocks Produced per Minute, Transactions Processed per Minute + +The blocks produced per minute are the gradient of this plot. + +![heights](./img/v037_r200c2_heights.png) + +Over a period of 2 minutes, the height goes from 477 to 524. +This results in an average of 23.5 blocks produced per minute. + +The transactions processed per minute are the gradient of this plot. + +![total-txs](./img/v037_r200c2_total-txs.png) + +Over a period of 2 minutes, the total goes from 64525 to 100125 transactions, +resulting in 17800 transactions per minute. However, we can see in the plot that +all transactions in the load are process long before the two minutes. +If we adjust the time window when transactions are processed (approx. 90 seconds), +we obtain 23733 transactions per minute. + +**These plots yield similar results to the baseline**: + +![heights-bl](../v034/img/v034_r200c2_heights.png) + +![total-txs](../v034/img/v034_r200c2_total-txs.png) + +#### Memory Resident Set Size + +Resident Set Size of all monitored processes is plotted below. + +![rss](./img/v037_r200c2_rss.png) + +The average over all processes oscillates around 380 MiB and does not demonstrate unconstrained growth. + +![rss-avg](./img/v037_r200c2_rss_avg.png) + +**These plots yield similar results to the baseline**: + +![rss-bl](../v034/img/v034_r200c2_rss.png) + +![rss-avg-bl](../v034/img/v034_r200c2_rss_avg.png) + +#### CPU utilization + +The best metric from Prometheus to gauge CPU utilization in a Unix machine is `load1`, +as it usually appears in the +[output of `top`](https://www.digitalocean.com/community/tutorials/load-average-in-linux). + +![load1](./img/v037_r200c2_load1.png) + +It is contained below 5 on most nodes. + +**This plot yields similar results to the baseline**: + +![load1](../v034/img/v034_r200c2_load1.png) + +### Test Result + +**Result: PASS** + +Date: 2022-10-14 + +Version: b9480d0ec79c53b06344148afc6589f895d0abbf + +## Rotating Node Testnet + +We use the same load as in the baseline: `c=4,r=800`. + +Just as in the baseline tests, the version of Tendermint used for these tests is affected by #9539. +See this paragraph in the [baseline report](../v034/README.md#rotating-node-testnet) for further details. +Finally, note that this setup allows for a fairer comparison between this version and the baseline. + +### Latencies + +The plot of all latencies can be seen here. + +![rotating-all-latencies](./img/v037_rotating_latencies.png) + +Which is similar to the baseline. + +![rotating-all-latencies-bl](../v034/img/v034_rotating_latencies_uniq.png) + +Note that we are comparing against the baseline plot with _unique_ +transactions. This is because the problem with duplicate transactions +detected during the baseline experiment did not show up for `v0.37`, +which is _not_ proof that the problem is not present in `v0.37`. + +### Prometheus Metrics + +The set of metrics shown here match those shown on the baseline (`v0.34`) for the same experiment. +We also show the baseline results for comparison. + +#### Blocks and Transactions per minute + +The blocks produced per minute are the gradient of this plot. + +![rotating-heights](./img/v037_rotating_heights.png) + +Over a period of 4446 seconds, the height goes from 5 to 3323. +This results in an average of 45 blocks produced per minute, +which is similar to the baseline, shown below. + +![rotating-heights-bl](../v034/img/v034_rotating_heights.png) + +The following two plots show only the heights reported by ephemeral nodes. +The second plot is the baseline plot for comparison. + +![rotating-heights-ephe](./img/v037_rotating_heights_ephe.png) + +![rotating-heights-ephe-bl](../v034/img/v034_rotating_heights_ephe.png) + +By the length of the segments, we can see that ephemeral nodes in `v0.37` +catch up slightly faster. + +The transactions processed per minute are the gradient of this plot. + +![rotating-total-txs](./img/v037_rotating_total-txs.png) + +Over a period of 3852 seconds, the total goes from 597 to 267298 transactions in one of the validators, +resulting in 4154 transactions per minute, which is slightly lower than the baseline, +although the baseline had to deal with duplicate transactions. + +For comparison, this is the baseline plot. + +![rotating-total-txs-bl](../v034/img/v034_rotating_total-txs.png) + +#### Peers + +The plot below shows the evolution of the number of peers throughout the experiment. + +![rotating-peers](./img/v037_rotating_peers.png) + +This is the baseline plot, for comparison. + +![rotating-peers-bl](../v034/img/v034_rotating_peers.png) + +The plotted values and their evolution are comparable in both plots. + +For further details on these plots, see the baseline report. + +#### Memory Resident Set Size + +The average Resident Set Size (RSS) over all processes looks slightly more stable +on `v0.37` (first plot) than on the baseline (second plot). + +![rotating-rss-avg](./img/v037_rotating_rss_avg.png) + +![rotating-rss-avg-bl](../v034/img/v034_rotating_rss_avg.png) + +The memory taken by the validators and the ephemeral nodes when they are up is comparable (not shown in the plots), +just as observed in the baseline. + +#### CPU utilization + +The plot shows metric `load1` for all nodes. + +![rotating-load1](./img/v037_rotating_load1.png) + +This is the baseline plot. + +![rotating-load1-bl](../v034/img/v034_rotating_load1.png) + +In both cases, it is contained under 5 most of the time, which is considered normal load. +The green line in the `v0.37` plot and the purple line in the baseline plot (`v0.34`) +correspond to the validators receiving all transactions, via RPC, from the load runner process. +In both cases, they oscillate around 5 (normal load). The main difference is that other +nodes are generally less loaded in `v0.37`. + +### Test Result + +**Result: PASS** + +Date: 2022-10-10 + +Version: 155110007b9d8b83997a799016c1d0844c8efbaf + +[\#9533]: https://github.com/tendermint/tendermint/pull/9533 +[\#9534]: https://github.com/tendermint/tendermint/pull/9534 +[\#9539]: https://github.com/tendermint/tendermint/issues/9539 +[\#9548]: https://github.com/tendermint/tendermint/issues/9548 +[\#9537]: https://github.com/tendermint/tendermint/issues/9537 diff --git a/docs/qa/v037/img/v037_200node_latencies.png b/docs/qa/v037/img/v037_200node_latencies.png new file mode 100644 index 00000000000..ad469bb29c3 Binary files /dev/null and b/docs/qa/v037/img/v037_200node_latencies.png differ diff --git a/docs/qa/v037/img/v037_latency_throughput.png b/docs/qa/v037/img/v037_latency_throughput.png new file mode 100644 index 00000000000..baf34b2c75d Binary files /dev/null and b/docs/qa/v037/img/v037_latency_throughput.png differ diff --git a/docs/qa/v037/img/v037_r200c2_heights.png b/docs/qa/v037/img/v037_r200c2_heights.png new file mode 100644 index 00000000000..360283f14c8 Binary files /dev/null and b/docs/qa/v037/img/v037_r200c2_heights.png differ diff --git a/docs/qa/v037/img/v037_r200c2_load1.png b/docs/qa/v037/img/v037_r200c2_load1.png new file mode 100644 index 00000000000..11d6dfcf7d8 Binary files /dev/null and b/docs/qa/v037/img/v037_r200c2_load1.png differ diff --git a/docs/qa/v037/img/v037_r200c2_mempool_size.png b/docs/qa/v037/img/v037_r200c2_mempool_size.png new file mode 100644 index 00000000000..a2f3bd40131 Binary files /dev/null and b/docs/qa/v037/img/v037_r200c2_mempool_size.png differ diff --git a/docs/qa/v037/img/v037_r200c2_mempool_size_avg.png b/docs/qa/v037/img/v037_r200c2_mempool_size_avg.png new file mode 100644 index 00000000000..480d4aebcbe Binary files /dev/null and b/docs/qa/v037/img/v037_r200c2_mempool_size_avg.png differ diff --git a/docs/qa/v037/img/v037_r200c2_peers.png b/docs/qa/v037/img/v037_r200c2_peers.png new file mode 100644 index 00000000000..222da73f65d Binary files /dev/null and b/docs/qa/v037/img/v037_r200c2_peers.png differ diff --git a/docs/qa/v037/img/v037_r200c2_rounds.png b/docs/qa/v037/img/v037_r200c2_rounds.png new file mode 100644 index 00000000000..7afaaac571a Binary files /dev/null and b/docs/qa/v037/img/v037_r200c2_rounds.png differ diff --git a/docs/qa/v037/img/v037_r200c2_rss.png b/docs/qa/v037/img/v037_r200c2_rss.png new file mode 100644 index 00000000000..730a1bc4904 Binary files /dev/null and b/docs/qa/v037/img/v037_r200c2_rss.png differ diff --git a/docs/qa/v037/img/v037_r200c2_rss_avg.png b/docs/qa/v037/img/v037_r200c2_rss_avg.png new file mode 100644 index 00000000000..3f6cf9f6dd2 Binary files /dev/null and b/docs/qa/v037/img/v037_r200c2_rss_avg.png differ diff --git a/docs/qa/v037/img/v037_r200c2_total-txs.png b/docs/qa/v037/img/v037_r200c2_total-txs.png new file mode 100644 index 00000000000..62dced2c871 Binary files /dev/null and b/docs/qa/v037/img/v037_r200c2_total-txs.png differ diff --git a/docs/qa/v037/img/v037_report_tabbed.txt b/docs/qa/v037/img/v037_report_tabbed.txt new file mode 100644 index 00000000000..aa4aa4e60cd --- /dev/null +++ b/docs/qa/v037/img/v037_report_tabbed.txt @@ -0,0 +1,52 @@ +Experiment ID: af129eae-7039-4c76-8c37-cff9ac636a84 │Experiment ID: 0f88bd33-9bf0-4197-8d1d-9a737c301ec6 │Experiment ID: 88227cad-2ba8-4eb6-b493-041d8120b46f + │ │ + Connections: 1 │ Connections: 2 │ Connections: 4 + Rate: 25 │ Rate: 25 │ Rate: 25 + Size: 1024 │ Size: 1024 │ Size: 1024 + │ │ + Total Valid Tx: 2225 │ Total Valid Tx: 4450 │ Total Valid Tx: 8900 + Total Negative Latencies: 0 │ Total Negative Latencies: 0 │ Total Negative Latencies: 0 + Minimum Latency: 506.248587ms │ Minimum Latency: 469.53452ms │ Minimum Latency: 588.900721ms + Maximum Latency: 3.032125789s │ Maximum Latency: 6.548830955s │ Maximum Latency: 6.533739843s + Average Latency: 1.427767726s │ Average Latency: 1.448582257s │ Average Latency: 1.717432341s + Standard Deviation: 524.11782ms │ Standard Deviation: 768.684133ms │ Standard Deviation: 1.000015768s + │ │ +Experiment ID: f03d39bd-0233-4b3c-b461-543445ae1d4b │Experiment ID: 46674f1c-e591-4e36-bb9b-f375c19fc475 │Experiment ID: 5385c159-8d4d-455b-bced-dcd4a3209988 + │ │ + Connections: 1 │ Connections: 2 │ Connections: 4 + Rate: 50 │ Rate: 50 │ Rate: 50 + Size: 1024 │ Size: 1024 │ Size: 1024 + │ │ + Total Valid Tx: 4450 │ Total Valid Tx: 8900 │ Total Valid Tx: 17800 + Total Negative Latencies: 0 │ Total Negative Latencies: 0 │ Total Negative Latencies: 0 + Minimum Latency: 477.46027ms │ Minimum Latency: 455.757111ms │ Minimum Latency: 594.749081ms + Maximum Latency: 2.483895394s │ Maximum Latency: 2.904715695s │ Maximum Latency: 9.294950389s + Average Latency: 1.407374662s │ Average Latency: 1.397385779s │ Average Latency: 2.621122536s + Standard Deviation: 505.150067ms │ Standard Deviation: 551.67603ms │ Standard Deviation: 1.772725794s + │ │ +Experiment ID: 9161b4a7-d75c-455f-b82d-2b5235d533cf │Experiment ID: 993a13a8-9db1-4b2b-9c20-71a5b85e4bbf │Experiment ID: ad1eb9e1-f4d6-41fd-9ba7-0f1f7dde1e3e + │ │ + Connections: 1 │ Connections: 2 │ Connections: 4 + Rate: 100 │ Rate: 100 │ Rate: 100 + Size: 1024 │ Size: 1024 │ Size: 1024 + │ │ + Total Valid Tx: 8900 │ Total Valid Tx: 17800 │ Total Valid Tx: 35400 + Total Negative Latencies: 0 │ Total Negative Latencies: 0 │ Total Negative Latencies: 0 + Minimum Latency: 448.050467ms │ Minimum Latency: 605.436195ms │ Minimum Latency: 1.16816912s + Maximum Latency: 3.789711139s │ Maximum Latency: 7.292770222s │ Maximum Latency: 11.378681842s + Average Latency: 1.451342158s │ Average Latency: 2.07457999s │ Average Latency: 3.918384209s + Standard Deviation: 644.075973ms │ Standard Deviation: 1.230204022s │ Standard Deviation: 2.172400458s + │ │ +Experiment ID: 3cbe9c3d-9c43-4c9f-b5ca-b567d20bbd57 │Experiment ID: af836c5e-d9b6-4d5d-971c-2fc7f07aa2a0 │Experiment ID: 77606397-4989-41d4-b13b-f1f4d1af063f + │ │ + Connections: 1 │ Connections: 2 │ Connections: 4 + Rate: 200 │ Rate: 200 │ Rate: 200 + Size: 1024 │ Size: 1024 │ Size: 1024 + │ │ + Total Valid Tx: 17800 │ Total Valid Tx: 35600 │ Total Valid Tx: 37358 + Total Negative Latencies: 0 │ Total Negative Latencies: 0 │ Total Negative Latencies: 0 + Minimum Latency: 519.984701ms │ Minimum Latency: 820.755087ms │ Minimum Latency: 1.712574804s + Maximum Latency: 12.609056712s │ Maximum Latency: 9.260798095s │ Maximum Latency: 25.739223696s + Average Latency: 2.717853101s │ Average Latency: 3.477731881s │ Average Latency: 8.547725264s + Standard Deviation: 2.390778155s │ Standard Deviation: 1.675000913s │ Standard Deviation: 4.76961569s + diff --git a/docs/qa/v037/img/v037_rotating_heights.png b/docs/qa/v037/img/v037_rotating_heights.png new file mode 100644 index 00000000000..882de51e41e Binary files /dev/null and b/docs/qa/v037/img/v037_rotating_heights.png differ diff --git a/docs/qa/v037/img/v037_rotating_heights_ephe.png b/docs/qa/v037/img/v037_rotating_heights_ephe.png new file mode 100644 index 00000000000..1ab2521e86f Binary files /dev/null and b/docs/qa/v037/img/v037_rotating_heights_ephe.png differ diff --git a/docs/qa/v037/img/v037_rotating_latencies.png b/docs/qa/v037/img/v037_rotating_latencies.png new file mode 100644 index 00000000000..94548c8b986 Binary files /dev/null and b/docs/qa/v037/img/v037_rotating_latencies.png differ diff --git a/docs/qa/v037/img/v037_rotating_load1.png b/docs/qa/v037/img/v037_rotating_load1.png new file mode 100644 index 00000000000..03b7412dae5 Binary files /dev/null and b/docs/qa/v037/img/v037_rotating_load1.png differ diff --git a/docs/qa/v037/img/v037_rotating_peers.png b/docs/qa/v037/img/v037_rotating_peers.png new file mode 100644 index 00000000000..86304760b25 Binary files /dev/null and b/docs/qa/v037/img/v037_rotating_peers.png differ diff --git a/docs/qa/v037/img/v037_rotating_rss_avg.png b/docs/qa/v037/img/v037_rotating_rss_avg.png new file mode 100644 index 00000000000..d45c045b706 Binary files /dev/null and b/docs/qa/v037/img/v037_rotating_rss_avg.png differ diff --git a/docs/qa/v037/img/v037_rotating_total-txs.png b/docs/qa/v037/img/v037_rotating_total-txs.png new file mode 100644 index 00000000000..50b4c2e3fff Binary files /dev/null and b/docs/qa/v037/img/v037_rotating_total-txs.png differ diff --git a/scripts/qa/reporting/README.md b/scripts/qa/reporting/README.md new file mode 100644 index 00000000000..088332837a9 --- /dev/null +++ b/scripts/qa/reporting/README.md @@ -0,0 +1,48 @@ +# Reporting Scripts + +This directory contains just one utility script at present that is used in +reporting/QA. + +## Latency vs Throughput Plotting + +[`latency_throughput.py`](./latency_throughput.py) is a Python script that uses +[matplotlib] to plot a graph of transaction latency vs throughput rate based on +the CSV output generated by the [loadtime reporting +tool](../../../test/loadtime/cmd/report/). + +### Setup + +Execute the following within this directory (the same directory as the +`latency_throughput.py` file). + +```bash +# Create a virtual environment into which to install your dependencies +python3 -m venv .venv + +# Activate the virtual environment +source .venv/bin/activate + +# Install dependencies listed in requirements.txt +pip install -r requirements.txt + +# Show usage instructions and parameters +./latency_throughput.py --help +``` + +### Running + +```bash +# Do the following while ensuring that the virtual environment is activated (see +# the Setup steps). +# +# This will generate a plot in a PNG file called 'tm034.png' in the current +# directory based on the reporting tool CSV output in the "raw.csv" file. The +# '-t' flag overrides the default title at the top of the plot. + +./latency_throughput.py \ + -t 'Tendermint v0.34.x Latency vs Throughput' \ + ./tm034.png \ + /path/to/csv/files/raw.csv +``` + +[matplotlib]: https://matplotlib.org/ diff --git a/scripts/qa/reporting/latency_throughput.py b/scripts/qa/reporting/latency_throughput.py new file mode 100755 index 00000000000..2cdab72ac77 --- /dev/null +++ b/scripts/qa/reporting/latency_throughput.py @@ -0,0 +1,170 @@ +#!/usr/bin/env python3 +""" +A simple script to parse the CSV output from the loadtime reporting tool (see +https://github.com/tendermint/tendermint/tree/main/test/loadtime/cmd/report). + +Produces a plot of average transaction latency vs total transaction throughput +according to the number of load testing tool WebSocket connections to the +Tendermint node. +""" + +import argparse +import csv +import logging +import sys +import matplotlib.pyplot as plt +import numpy as np + +DEFAULT_TITLE = "Tendermint latency vs throughput" + + +def main(): + parser = argparse.ArgumentParser( + description="Renders a latency vs throughput diagram " + "for a set of transactions provided by the loadtime reporting tool", + formatter_class=argparse.ArgumentDefaultsHelpFormatter) + parser.add_argument('-t', + '--title', + default=DEFAULT_TITLE, + help='Plot title') + parser.add_argument('output_image', + help='Output image file (in PNG format)') + parser.add_argument( + 'input_csv_file', + nargs='+', + help="CSV input file from which to read transaction data " + "- must have been generated by the loadtime reporting tool") + args = parser.parse_args() + + logging.basicConfig(format='%(levelname)s\t%(message)s', + stream=sys.stdout, + level=logging.INFO) + plot_latency_vs_throughput(args.input_csv_file, + args.output_image, + title=args.title) + + +def plot_latency_vs_throughput(input_files, output_image, title=DEFAULT_TITLE): + avg_latencies, throughput_rates = process_input_files(input_files, ) + + fig, ax = plt.subplots() + + connections = sorted(avg_latencies.keys()) + for c in connections: + tr = np.array(throughput_rates[c]) + al = np.array(avg_latencies[c]) + label = '%d connection%s' % (c, '' if c == 1 else 's') + ax.plot(tr, al, 'o-', label=label) + + ax.set_title(title) + ax.set_xlabel('Throughput rate (tx/s)') + ax.set_ylabel('Average transaction latency (s)') + + plt.legend(loc='upper left') + plt.savefig(output_image) + + +def process_input_files(input_files): + # Experimental data from which we will derive the latency vs throughput + # statistics + experiments = {} + + for input_file in input_files: + logging.info('Reading %s...' % input_file) + + with open(input_file, 'rt') as inf: + reader = csv.DictReader(inf) + for tx in reader: + experiments = process_tx(experiments, tx) + + return compute_experiments_stats(experiments) + + +def process_tx(experiments, tx): + exp_id = tx['experiment_id'] + # Block time is nanoseconds from the epoch - convert to seconds + block_time = float(tx['block_time']) / (10**9) + # Duration is also in nanoseconds - convert to seconds + duration = float(tx['duration_ns']) / (10**9) + connections = int(tx['connections']) + rate = int(tx['rate']) + + if exp_id not in experiments: + experiments[exp_id] = { + 'connections': connections, + 'rate': rate, + 'block_time_min': block_time, + # We keep track of the latency associated with the minimum block + # time to estimate the start time of the experiment + 'block_time_min_duration': duration, + 'block_time_max': block_time, + 'total_latencies': duration, + 'tx_count': 1, + } + logging.info('Found experiment %s with rate=%d, connections=%d' % + (exp_id, rate, connections)) + else: + # Validation + for field in ['connections', 'rate']: + val = int(tx[field]) + if val != experiments[exp_id][field]: + raise Exception( + 'Found multiple distinct values for field ' + '"%s" for the same experiment (%s): %d and %d' % + (field, exp_id, val, experiments[exp_id][field])) + + if block_time < experiments[exp_id]['block_time_min']: + experiments[exp_id]['block_time_min'] = block_time + experiments[exp_id]['block_time_min_duration'] = duration + if block_time > experiments[exp_id]['block_time_max']: + experiments[exp_id]['block_time_max'] = block_time + + experiments[exp_id]['total_latencies'] += duration + experiments[exp_id]['tx_count'] += 1 + + return experiments + + +def compute_experiments_stats(experiments): + """Compute average latency vs throughput rate statistics from the given + experiments""" + stats = {} + + # Compute average latency and throughput rate for each experiment + for exp_id, exp in experiments.items(): + conns = exp['connections'] + avg_latency = exp['total_latencies'] / exp['tx_count'] + exp_start_time = exp['block_time_min'] - exp['block_time_min_duration'] + exp_duration = exp['block_time_max'] - exp_start_time + throughput_rate = exp['tx_count'] / exp_duration + if conns not in stats: + stats[conns] = [] + + stats[conns].append({ + 'avg_latency': avg_latency, + 'throughput_rate': throughput_rate, + }) + + # Sort stats for each number of connections in order of increasing + # throughput rate, and then extract average latencies and throughput rates + # as separate data series. + conns = sorted(stats.keys()) + avg_latencies = {} + throughput_rates = {} + for c in conns: + stats[c] = sorted(stats[c], key=lambda s: s['throughput_rate']) + avg_latencies[c] = [] + throughput_rates[c] = [] + for s in stats[c]: + avg_latencies[c].append(s['avg_latency']) + throughput_rates[c].append(s['throughput_rate']) + logging.info('For %d connection(s): ' + 'throughput rate = %.6f tx/s\t' + 'average latency = %.6fs' % + (c, s['throughput_rate'], s['avg_latency'])) + + return (avg_latencies, throughput_rates) + + +if __name__ == "__main__": + main() diff --git a/scripts/qa/reporting/requirements.txt b/scripts/qa/reporting/requirements.txt new file mode 100644 index 00000000000..4486cd522e5 --- /dev/null +++ b/scripts/qa/reporting/requirements.txt @@ -0,0 +1,11 @@ +contourpy==1.0.5 +cycler==0.11.0 +fonttools==4.37.4 +kiwisolver==1.4.4 +matplotlib==3.6.1 +numpy==1.23.4 +packaging==21.3 +Pillow==9.2.0 +pyparsing==3.0.9 +python-dateutil==2.8.2 +six==1.16.0