[Mithril] script updates #1785

TrevorBenson · 2024-07-26T15:04:03Z

Description

Changes to the nginx mithril relay LB and squid mithril relay
- Nginx mithril relay LB:
  - Uses stream protocol
  - proxy_connect_timeout of 10 seconds
  - max_fails set to 1
  - fail_timeout set to 10 seconds per relay (20 seconds for 2 relay configurations, 30 for 3, etc.)
- Squid mithril relay:
  - Change to how ACL's and allow rulesare written for multiple block producers
  - Includes rules for non block producers (other relays, local relay IP, simplifies testing from other than block producer) etc.
Enhances the verify_signer_registered function
- Continues to check if mithril-signer has registered in the current epoch.
- If the signer has registered for the current epoch it will also check if the signer registered two epochs earlier and reports if the signer is valid to sign certificates in the current epoch or not.
Adds a new MITHRIL_HOME variable to the scripts/cnode-helper-scripts/env file.
- Defaults to the current path of ${CNODE_HOME}/mithril.
- Provides the SPO a way to set a unique path for where mithril environment and mithril data-stores will be located.
Changes to a status code for checks on mithril minimum versions. Sourced from past conversation w/ @Scitz0
Adds functions semantic_version_check and check_mithril_upgrade_safe
- GitHub Actions workflow now uses check_mithril_upgrade_safe against the repo node-latest.txt to determine if changing mithril-latest.xt is acceptable.

Not yet implemented in guild-deploy.sh, but the check_mithril_upgrade_safe should support both CI Actions and production node/signer environments. Open for implementation discussions.

Where to start?

I'd suggest review of each commit vs. reviewing the files changed. The commit for Refactor mithril functions into mithril library. will be quite large. There is a general section of functions used by multiple scripts, and then functions are sorted into blocks by the script that uses them, client, relay or signer. Each section is alphabetically sorted.

Motivation and context

Nginx relay/lb

Issues with current nginx config reported in Koios support channel.
Simplified testing of squid proxy when additional IPs are provided.
Offline relays were adding 60 second delays until nginx timeouts occured and LB moved onto the next relay.
Tuned the max fails to 1 and fail_timeout 10 seconds * # of relays.
Temporarily prevents re-using an offline relay if many requests get sent through the same LB

Reduce confusion for new signer users who were not aware signing had a 2 epoch delay.
Flexibility for SPO's to choose their own directories.
Past discussions with @Scitz0
Past discussions with @rdlrt

How has this been tested?

Nginx relay/squid
- Direct curl requests on the block producer:
  curl -4 --proxy http://127.0.0.1:3132 https://aggregator.release-mainnet.api.mithril.network/aggregator
- Enabling tcpdump on port 3132 for all (squid) mithril relays, disabling the first relay in the mithril_relays upstream definition of the (nginx) mithril relay lb. Then timing the request while watching traffic from each mithril relay:
  time curl -4 --proxy http://127.0.0.1:3132 https://aggregator.release-mainnet.api.mithril.network/aggregator.
- The first request takes 10.0x seconds, to succeed, subsequent requests take sub 1 second and traffic shows round robin over the remaining online mithril relays. If round robin reaches the offline mithril relay before the fail_timout has expired it gets skipped and the traffic is again sent to the first online (second in the list) upstream mithril relay.
Manually
Test virtual machines altering path to /opt/cardano/mithril
Testing locally and in forked repository github actions
Testing locally and in forked repository github actions

TrevorBenson · 2024-07-26T16:01:46Z

@gmoratorio This PR's for you.

TrevorBenson · 2024-07-26T16:58:58Z

@gmoratorio I found stream to work in the container at docker hub, which appears to be running nginx v1.25. However I also ran a different test where I found nginx v1.20.1 does not support stream nginx: [emerg] unknown directive "stream" in /etc/nginx/nginx.conf:7.

I am investigating various ubuntu versions now to see if/when stream support shows up.

TrevorBenson · 2024-07-26T17:31:23Z

hub.docker.com - nginx/1.25.4 - includes stream support
Ubuntu 20.04 - nginx/1.18.0 - no stream support
Ubuntu 24.04 - nginx/1.24.0 - no stream support

I did have success however adding the nginx stable repository to ubuntu 20.04 and updating to nginx 1.26. This guide describes the process.

gmoratorio · 2024-07-26T22:35:27Z

Thanks @TrevorBenson , updating to nginx 1.26 with your instructions solved the stream issue in the nginx.conf. One strange thing I did notice after upgrading using those instructions, though, is when I tried to see the mithril service logs with sudo journalctl -u cnode-mithril-signer.service -f I started getting this error in the logs

Jul 26 21:24:54 bp.us-central1-a.c.stake-pool-428321.internal systemd[1]: /etc/systemd/system/cnode-mithril-signer.service:19: Standard output type syslog is obsolete, automatically updating to journal. Please update your unit file, and consider removing the setting altogether.

sure enough, removing these two entries in the cnode-mithril-signer.service script

StandardOutput=syslog
StandardError=syslog

Resolves the logging error. But definitely something to look out for.

TrevorBenson · 2024-07-26T22:50:30Z

Thanks @TrevorBenson , updating to nginx 1.26 with your instructions solved the stream issue in the nginx.conf. One strange thing I did notice after upgrading using those instructions, though, is when I tried to see the mithril service logs with sudo journalctl -u cnode-mithril-signer.service -f I started getting this error in the logs
Jul 26 21:24:54 bp.us-central1-a.c.stake-pool-428321.internal systemd[1]: /etc/systemd/system/cnode-mithril-signer.service:19: Standard output type syslog is obsolete, automatically updating to journal. Please update your unit file, and consider removing the setting altogether.
sure enough, removing these two entries in the cnode-mithril-signer.service script
StandardOutput=syslog
StandardError=syslog
Resolves the logging error. But definitely something to look out for.

For tracking purposes @gmoratorio and I spoke in DM. I suspect this might come from libsystemd0/noble-updates 255.4-1ubuntu8.2 amd64 [upgradable from: 255.4-1ubuntu8.1] which appears to have been released a couple of weeks ago.

TrevorBenson · 2024-07-30T05:13:23Z

Converted from ready to draft.

Now that I have confirmed that the nginx stable repository did not cause the syslog errors observed by @gmoratorio I feel it is likely acceptable to enable this repository to get a minimized nginx load balancer configuration using stream proxy module.

Also, now that #1789 has merged I'm going to review all mithril scripts and mithril.library for a small refactor on user input etc.

I'll move this PR back to ready for review later this week once complete.

…ream, proxy_connect_timeout 10, max_fails=1 and fail_timeout 10 seconds per relay

…ode binaries.

scripts/cnode-helper-scripts/guild-deploy.sh

@Scitz0

## Description 1. Changes to the nginx mithril relay LB and squid mithril relay * Nginx mithril relay LB: * Uses stream protocol * proxy_connect_timeout of 10 seconds * max_fails set to 1 * fail_timeout set to 10 seconds per relay (20 seconds for 2 relay configurations, 30 for 3, etc.) * Squid mithril relay: * Change to how ACL's and allow rulesare written for multiple block producers * Includes rules for non block producers (other relays, local relay IP, simplifies testing from other than block producer) etc. 2. Enhances the verify_signer_registered function * Continues to check if mithril-signer has registered in the current epoch. * If the signer has registered for the current epoch it will also check if the signer registered two epochs earlier and reports if the signer is valid to sign certificates in the current epoch or not. 3. Adds a new MITHRIL_HOME variable to the `scripts/cnode-helper-scripts/env` file. * Defaults to the current path of `${CNODE_HOME}/mithril`. * Provides the SPO a way to set a unique path for where mithril environment and mithril data-stores will be located. 4. Changes to a status code for checks on mithril minimum versions. Sourced from past conversation w/ @Scitz0 5. Adds functions **semantic_version_check** and **check_mithril_upgrade_safe** * GitHub Actions workflow now uses **check_mithril_upgrade_safe** against the repo `node-latest.txt` to determine if changing `mithril-latest.xt` is acceptable. * Not yet implemented in `guild-deploy.sh`, but the **check_mithril_upgrade_safe** should support both CI Actions and production node/signer environments. Open for implementation discussions. ## Where to start? I'd suggest review of each commit vs. reviewing the files changed. The commit for **Refactor mithril functions into mithril library.** will be quite large. There is a general section of functions used by multiple scripts, and then functions are sorted into blocks by the script that uses them, client, relay or signer. Each section is alphabetically sorted. ## Motivation and context 1. Nginx relay/lb * Issues with current nginx config reported in Koios support channel. * Simplified testing of squid proxy when additional IPs are provided. * Offline relays were adding 60 second delays until nginx timeouts occured and LB moved onto the next relay. * Tuned the max fails to 1 and fail_timeout 10 seconds * # of relays. * Temporarily prevents re-using an offline relay if many requests get sent through the same LB 2. Reduce confusion for new signer users who were not aware signing had a 2 epoch delay. 3. Flexibility for SPO's to choose their own directories. 4. Past discussions with @Scitz0 5. Past discussions with @rdlrt ## How has this been tested? 1. Nginx relay/squid * Direct curl requests on the block producer: `curl -4 --proxy http://127.0.0.1:3132 https://aggregator.release-mainnet.api.mithril.network/aggregator` * Enabling tcpdump on port 3132 for all (squid) mithril relays, disabling the first relay in the mithril_relays upstream definition of the (nginx) mithril relay lb. Then timing the request while watching traffic from each mithril relay: `time curl -4 --proxy http://127.0.0.1:3132 https://aggregator.release-mainnet.api.mithril.network/aggregator`. * The first request takes 10.0x seconds, to succeed, subsequent requests take sub 1 second and traffic shows round robin over the remaining online mithril relays. If round robin reaches the offline mithril relay before the fail_timout has expired it gets skipped and the traffic is again sent to the first online (second in the list) upstream mithril relay. 2. Manually 3. Test virtual machines altering path to `/opt/cardano/mithril` 4. Testing locally and in forked repository github actions 5. Testing locally and in forked repository github actions ![image](https://github.com/user-attachments/assets/3838f427-1f38-4808-8188-eb517def0153) --------- Co-authored-by: RdLrT <[email protected]>

TrevorBenson requested review from rdlrt and Scitz0 July 26, 2024 15:04

TrevorBenson mentioned this pull request Jul 29, 2024

Syslog deprecated log messages #1792

Closed

TrevorBenson marked this pull request as draft July 30, 2024 05:07

TrevorBenson added 3 commits August 7, 2024 09:14

squid config ask for additional allows, nginx mithril relay lb use st…

91f0f38

…ream, proxy_connect_timeout 10, max_fails=1 and fail_timeout 10 seconds per relay

Enhanced verify_signer_registered

39d6192

MITHRIL_HOME to allow custom paths outside of CNODE_HOME

e42b5f9

TrevorBenson force-pushed the revised-mithril-relay-lb-with-timeouts branch 3 times, most recently from bb54843 to ccf4e33 Compare August 7, 2024 22:15

Refactor mithril functions into mithril library.

d498e5a

TrevorBenson force-pushed the revised-mithril-relay-lb-with-timeouts branch from ccf4e33 to f951c7d Compare August 8, 2024 00:00

TrevorBenson added 2 commits August 7, 2024 17:09

Use status code greater than 200

0fd3dfc

Provide a way to support checking compatible version of mithril and n…

f75a2f9

…ode binaries.

TrevorBenson force-pushed the revised-mithril-relay-lb-with-timeouts branch from f951c7d to f75a2f9 Compare August 8, 2024 00:09

TrevorBenson marked this pull request as ready for review August 8, 2024 00:13

rdlrt reviewed Aug 8, 2024

View reviewed changes

scripts/cnode-helper-scripts/guild-deploy.sh Show resolved Hide resolved

TrevorBenson changed the title ~~Changes to the nginx mithril relay LB and squid mithril relay~~ [Mithril] script updates Aug 8, 2024

TrevorBenson added 2 commits August 8, 2024 08:30

Instantiate MITHRIL_HOME if not already set by environment.

537b666

Remove leftover debug satement

1ef3670

TrevorBenson force-pushed the revised-mithril-relay-lb-with-timeouts branch from ab34c90 to 1ef3670 Compare August 8, 2024 15:30

TrevorBenson requested a review from rdlrt August 8, 2024 15:30

rdlrt reviewed Aug 9, 2024

View reviewed changes

scripts/cnode-helper-scripts/guild-deploy.sh Outdated Show resolved Hide resolved

Fix position of MITHRIL_HOME

eca97a9

TrevorBenson requested a review from rdlrt August 9, 2024 05:26

rdlrt approved these changes Aug 9, 2024

View reviewed changes

Merge branch 'alpha' into revised-mithril-relay-lb-with-timeouts

befa5f0

rdlrt merged commit 1b0b6b0 into alpha Aug 10, 2024
1 of 3 checks passed

rdlrt deleted the revised-mithril-relay-lb-with-timeouts branch August 10, 2024 01:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Mithril] script updates #1785

[Mithril] script updates #1785

TrevorBenson commented Jul 26, 2024 •

edited

Loading

TrevorBenson commented Jul 26, 2024

TrevorBenson commented Jul 26, 2024

TrevorBenson commented Jul 26, 2024

gmoratorio commented Jul 26, 2024

TrevorBenson commented Jul 26, 2024 •

edited

Loading

TrevorBenson commented Jul 30, 2024

[Mithril] script updates #1785

[Mithril] script updates #1785

Conversation

TrevorBenson commented Jul 26, 2024 • edited Loading

Description

Where to start?

Motivation and context

How has this been tested?

TrevorBenson commented Jul 26, 2024

TrevorBenson commented Jul 26, 2024

TrevorBenson commented Jul 26, 2024

gmoratorio commented Jul 26, 2024

TrevorBenson commented Jul 26, 2024 • edited Loading

TrevorBenson commented Jul 30, 2024

TrevorBenson commented Jul 26, 2024 •

edited

Loading

TrevorBenson commented Jul 26, 2024 •

edited

Loading