-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mithril] script updates #1785
[Mithril] script updates #1785
Conversation
@gmoratorio This PR's for you. |
@gmoratorio I found stream to work in the container at docker hub, which appears to be running nginx v1.25. However I also ran a different test where I found nginx v1.20.1 does not support stream I am investigating various ubuntu versions now to see if/when stream support shows up. |
I did have success however adding the nginx stable repository to ubuntu 20.04 and updating to nginx 1.26. This guide describes the process. |
Thanks @TrevorBenson , updating to
sure enough, removing these two entries in the
Resolves the logging error. But definitely something to look out for. |
For tracking purposes @gmoratorio and I spoke in DM. I suspect this might come from libsystemd0/noble-updates 255.4-1ubuntu8.2 amd64 [upgradable from: 255.4-1ubuntu8.1] which appears to have been released a couple of weeks ago. |
Converted from ready to draft. Now that I have confirmed that the nginx stable repository did not cause the syslog errors observed by @gmoratorio I feel it is likely acceptable to enable this repository to get a minimized nginx load balancer configuration using stream proxy module. Also, now that #1789 has merged I'm going to review all mithril scripts and mithril.library for a small refactor on user input etc. I'll move this PR back to ready for review later this week once complete. |
…ream, proxy_connect_timeout 10, max_fails=1 and fail_timeout 10 seconds per relay
bb54843
to
ccf4e33
Compare
ccf4e33
to
f951c7d
Compare
f951c7d
to
f75a2f9
Compare
ab34c90
to
1ef3670
Compare
## Description 1. Changes to the nginx mithril relay LB and squid mithril relay * Nginx mithril relay LB: * Uses stream protocol * proxy_connect_timeout of 10 seconds * max_fails set to 1 * fail_timeout set to 10 seconds per relay (20 seconds for 2 relay configurations, 30 for 3, etc.) * Squid mithril relay: * Change to how ACL's and allow rulesare written for multiple block producers * Includes rules for non block producers (other relays, local relay IP, simplifies testing from other than block producer) etc. 2. Enhances the verify_signer_registered function * Continues to check if mithril-signer has registered in the current epoch. * If the signer has registered for the current epoch it will also check if the signer registered two epochs earlier and reports if the signer is valid to sign certificates in the current epoch or not. 3. Adds a new MITHRIL_HOME variable to the `scripts/cnode-helper-scripts/env` file. * Defaults to the current path of `${CNODE_HOME}/mithril`. * Provides the SPO a way to set a unique path for where mithril environment and mithril data-stores will be located. 4. Changes to a status code for checks on mithril minimum versions. Sourced from past conversation w/ @Scitz0 5. Adds functions **semantic_version_check** and **check_mithril_upgrade_safe** * GitHub Actions workflow now uses **check_mithril_upgrade_safe** against the repo `node-latest.txt` to determine if changing `mithril-latest.xt` is acceptable. * Not yet implemented in `guild-deploy.sh`, but the **check_mithril_upgrade_safe** should support both CI Actions and production node/signer environments. Open for implementation discussions. ## Where to start? I'd suggest review of each commit vs. reviewing the files changed. The commit for **Refactor mithril functions into mithril library.** will be quite large. There is a general section of functions used by multiple scripts, and then functions are sorted into blocks by the script that uses them, client, relay or signer. Each section is alphabetically sorted. ## Motivation and context 1. Nginx relay/lb * Issues with current nginx config reported in Koios support channel. * Simplified testing of squid proxy when additional IPs are provided. * Offline relays were adding 60 second delays until nginx timeouts occured and LB moved onto the next relay. * Tuned the max fails to 1 and fail_timeout 10 seconds * # of relays. * Temporarily prevents re-using an offline relay if many requests get sent through the same LB 2. Reduce confusion for new signer users who were not aware signing had a 2 epoch delay. 3. Flexibility for SPO's to choose their own directories. 4. Past discussions with @Scitz0 5. Past discussions with @rdlrt ## How has this been tested? 1. Nginx relay/squid * Direct curl requests on the block producer: `curl -4 --proxy http://127.0.0.1:3132 https://aggregator.release-mainnet.api.mithril.network/aggregator` * Enabling tcpdump on port 3132 for all (squid) mithril relays, disabling the first relay in the mithril_relays upstream definition of the (nginx) mithril relay lb. Then timing the request while watching traffic from each mithril relay: `time curl -4 --proxy http://127.0.0.1:3132 https://aggregator.release-mainnet.api.mithril.network/aggregator`. * The first request takes 10.0x seconds, to succeed, subsequent requests take sub 1 second and traffic shows round robin over the remaining online mithril relays. If round robin reaches the offline mithril relay before the fail_timout has expired it gets skipped and the traffic is again sent to the first online (second in the list) upstream mithril relay. 2. Manually 3. Test virtual machines altering path to `/opt/cardano/mithril` 4. Testing locally and in forked repository github actions 5. Testing locally and in forked repository github actions ![image](https://github.com/user-attachments/assets/3838f427-1f38-4808-8188-eb517def0153) --------- Co-authored-by: RdLrT <[email protected]>
Description
scripts/cnode-helper-scripts/env
file.${CNODE_HOME}/mithril
.node-latest.txt
to determine if changingmithril-latest.xt
is acceptable.guild-deploy.sh
, but the check_mithril_upgrade_safe should support both CI Actions and production node/signer environments. Open for implementation discussions.Where to start?
I'd suggest review of each commit vs. reviewing the files changed. The commit for Refactor mithril functions into mithril library. will be quite large. There is a general section of functions used by multiple scripts, and then functions are sorted into blocks by the script that uses them, client, relay or signer. Each section is alphabetically sorted.
Motivation and context
How has this been tested?
Nginx relay/squid
Direct curl requests on the block producer:
curl -4 --proxy http://127.0.0.1:3132 https://aggregator.release-mainnet.api.mithril.network/aggregator
Enabling tcpdump on port 3132 for all (squid) mithril relays, disabling the first relay in the mithril_relays upstream definition of the (nginx) mithril relay lb. Then timing the request while watching traffic from each mithril relay:
time curl -4 --proxy http://127.0.0.1:3132 https://aggregator.release-mainnet.api.mithril.network/aggregator
.The first request takes 10.0x seconds, to succeed, subsequent requests take sub 1 second and traffic shows round robin over the remaining online mithril relays. If round robin reaches the offline mithril relay before the fail_timout has expired it gets skipped and the traffic is again sent to the first online (second in the list) upstream mithril relay.
Manually
Test virtual machines altering path to
/opt/cardano/mithril
Testing locally and in forked repository github actions
Testing locally and in forked repository github actions