-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(raft): Implement Raft-based Consistent Hash State Management #636
Conversation
This commit introduces Raft consensus to maintain consistency of hash-to-proxy mappings across multiple GatewayD instances. Key changes include: - Add new Raft package implementing consensus protocol using HashiCorp's Raft - Integrate Raft with consistent hashing load balancer - Store proxy mappings in distributed state machine - Add configuration options for Raft cluster setup - Implement leadership monitoring and peer management - Add FSM snapshot and restore capabilities The implementation ensures that hash-to-proxy mappings remain consistent across cluster nodes, improving reliability for consistent hash-based load balancing.
…y mapping - Replace proxy ID with block name for consistent hash mapping - Remove direct raft node dependency from ConsistentHash struct - Add ProxyByBlock map to Server for block-based proxy lookups - Include group name in hash key generation for better distribution - Add proxy initialization during server startup - Update FSM to use consistent naming for hash map storage This change improves the consistent hashing mechanism by using block names instead of proxy IDs, making it more aligned with the block-based architecture while maintaining backwards compatibility with the original load balancing strategy.
- Remove unused UUID-based ID field from Proxy struct - Remove GetID() method from IProxy interface and Proxy implementation - Remove GetProxyByID() method from Server struct - Remove uuid package dependency The proxy ID was not being used meaningfully in the codebase, so removing it simplifies the proxy implementation.
This commit introduces comprehensive Raft testing infrastructure and enhances the consistent hash implementation with distributed state management. Key changes: - Add new test cases for Raft leadership, follower behavior, and FSM operations - Integrate Raft with consistent hash load balancer for distributed state - Add TestRaftHelper utility for simplified Raft testing setup - Update consistent hash tests to use Raft for state persistence - Add GetState method to RaftNode for state inspection - Improve test coverage for concurrent operations The changes ensure that proxy mappings are consistently maintained across the cluster using Raft consensus, making the load balancer more reliable in distributed environments.
- Add Directory field to Raft config to make raft storage location configurable - Use t.TempDir() in tests to ensure proper cleanup of test directories - Rename HashMapCommand to ConsistentHashCommand for better clarity - Update command type constants and map names to be more descriptive - Fix test flakiness by using unique node IDs and random available ports - Remove manual directory cleanup in favor of t.TempDir() cleanup - Update configuration files with raft directory settings This change improves test stability and makes the raft storage location configurable while cleaning up naming conventions throughout the raft package.
Add default configuration values for Raft consensus implementation: - RaftAddress: 127.0.0.1:2223 - RaftNodeID: node1 - RaftLeaderID: node1 - RaftDirectory: raft This change initializes the default Raft configuration in the config loader.
- Enhance error handling with wrapped errors and detailed messages - Add meaningful constants for timeouts and configuration values - Rename RaftNode to Node for better clarity - Fix JSON field names to match Raft convention (nodeId, leaderId) - Add missing error checks in critical paths - Improve documentation and code comments - Update golangci linter settings to include raft package
- Introduced a temporary directory for Raft using t.TempDir() in the Test_pluginScaffoldCmd test case. - Set the GATEWAYD_RAFT_DIRECTORY environment variable to the new temporary directory. - This change ensures that Raft operations during testing are isolated and do not interfere with other tests or system directories.
Overview
Packages and Vulnerabilities (13 package changes and 0 vulnerability changes)
Changes for packages of type
|
Package | Versionghcr.io/gatewayd-io/gatewayd:f6aba9f |
Versiongatewaydio/gatewayd:latest |
|
---|---|---|---|
➖ | ca-certificates | 20240705-r0 |
|
➖ | openssl | 3.3.2-r0 |
|
➖ | pax-utils | 1.3.7-r2 |
Changes for packages of type golang
(10 changes)
Package | Versionghcr.io/gatewayd-io/gatewayd:f6aba9f |
Versiongatewaydio/gatewayd:latest |
|
---|---|---|---|
➖ | github.com/armon/go-metrics | 0.4.1 |
|
➖ | github.com/boltdb/bolt | 1.3.1 |
|
♾️ | github.com/gatewayd-io/gatewayd | (devel) |
0.0.0-20241109120212-7f47dca74c26 |
➖ | github.com/hashicorp/go-immutable-radix | 1.0.0 |
|
➖ | github.com/hashicorp/go-msgpack/v2 | 2.1.2 |
|
➖ | github.com/hashicorp/golang-lru | 0.5.1 |
|
➖ | github.com/hashicorp/raft | 1.7.1 |
|
➖ | github.com/hashicorp/raft-boltdb | 0.0.0-20231211162105-6c830fa4535e |
|
♾️ | google.golang.org/protobuf | 1.35.2 |
1.35.1 |
♾️ | stdlib | go1.23.4 |
1.23.3 |
- Replace loadEnvVars with loadEnvVarsWithTransform to handle complex env values - Add special handling for raft.peers to parse JSON array into RaftPeer structs - Update GlobalKoanf and PluginKoanf to use new transformer function This change allows proper parsing of list-type environment variables, specifically for raft peer configurations.
Add gRPC support to the Raft implementation to enable proper request forwarding between nodes. Changes include: - Add protobuf definitions for Raft service with ForwardApply RPC - Add gRPC server and client implementations for Raft nodes - Update Raft configuration to include gRPC addresses - Implement request forwarding logic for non-leader nodes - Update node configuration to handle gRPC connections - Add proper cleanup of gRPC resources during shutdown The changes enable proper forwarding of apply requests from follower nodes to the leader, improving the distributed consensus mechanism.
Add docker-compose-raft.yaml that configures a 3-node GatewayD cluster using Raft consensus protocol. The setup includes: - 3 GatewayD nodes with Raft configuration - Separate read/write PostgreSQL instances - Redis for caching - Observability stack (Prometheus, Tempo, Grafana) - Plugin installation service This configuration enables high availability and leader election through Raft consensus.
- Improve variable naming in loadEnvVarsWithTransform for better readability - Clean up error handling in forwardToLeader and ForwardApply - Add proper error propagation in RPC responses - Fix string type conversions for peer IDs and addresses - Organize imports and add missing error package - Remove unused convertPeers function - Add clarifying comments for Apply methods This commit focuses on code quality improvements and better error handling in the Raft implementation without changing core functionality.
- Implement `TestRPCServer_ForwardApply` to test the `ForwardApply` method of the RPC server, ensuring correct handling of apply requests with various configurations. - Implement `TestRPCClient` to verify the creation and management of RPC clients, including client retrieval and connection closure. - Utilize `setupGRPCServer` to create a gRPC server for testing purposes. - Ensure proper setup and teardown of test nodes and gRPC connections to maintain test isolation and reliability.
- Change `nodeId` and `leaderId` from `node2` to `node1`. - Add `grpcAddress` with value `127.0.0.1:50051`. - Update `peers` to an empty list instead of an empty dictionary. These changes adjust the Raft configuration to reflect the new node setup and include a gRPC address for communication.
The function `v1.NewStruct(args)` only accepts `NewValue`, which requires converting certain types to strings. This change adds support for converting a slice of `config.RaftPeer` to a comma-separated string format. Each peer is formatted as "ID:Address:GRPCAddress". This conversion is necessary to overwrite the peers as an environment variable.
- Updated the checksum value for the plugin configuration to ensure integrity and consistency with the latest changes.
- Replaced `LeaderID` with `IsBootstrap` in Raft configuration across multiple files. - Updated YAML configuration files (`gatewayd.yaml`, `docker-compose-raft.yaml`) to reflect the new `IsBootstrap` flag. - Modified Go source files (`config.go`, `constants.go`, `types.go`, `raft.go`) to use `IsBootstrap` instead of `LeaderID`. - Adjusted test cases in `raft_test.go`, `rpc_test.go`, and `raft_helpers.go` to accommodate the new `IsBootstrap` flag. - Ensured that the `IsBootstrap` flag is correctly set for nodes intended to bootstrap the Raft cluster.
- Added `t.Helper()` to `setupGRPCServer` and `setupNodes` functions to improve test helper identification. - Corrected variable naming in `TestRPCServer_ForwardApply` for clarity and consistency. - Ensured comments end with a period for consistency. - Updated assertions to use `GetSuccess()` method for better readability.
- Updated Docker image references in `docker-compose-raft.yaml` to use `gatewaydio/gatewayd:latest` and added `pull_policy: always` for consistent image updates. - Changed server and API addresses in `gatewayd.yaml` for better port management. - Enhanced logging in `raft.go` by switching from `Info` to `Debug` for certain messages to reduce verbosity. - Added detailed comments in `raft.go` and `rpc.go` to explain the purpose and functionality of key methods, improving code readability and maintainability. - Introduced new helper functions with comments to clarify their roles in the Raft and RPC processes.
- Updated `createTestRedis` in `act_helpers_test.go` to use `wait.ForAll` for better reliability by ensuring both log readiness and port listening. - Enhanced `Test_Run_Async_Redis` in `registry_test.go` by adding a context with a timeout to the consumer subscription for improved test robustness. - Simplified the sleep duration in `Test_Run_Async_Redis` to reduce unnecessary wait time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done! ❤️ 🚀
@@ -62,7 +62,10 @@ func createTestRedis(t *testing.T) string { | |||
req := testcontainers.ContainerRequest{ | |||
Image: "redis:6", | |||
ExposedPorts: []string{"6379/tcp"}, | |||
WaitingFor: wait.ForLog("Ready to accept connections"), | |||
WaitingFor: wait.ForAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
interval: 5s | ||
timeout: 5s | ||
retries: 5 | ||
gatewayd-1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ❤️ this.
- Added error handling to record and log errors when Raft node initialization fails. - Ensured the application exits with a specific error code if the Raft node cannot be started. - Updated tests to set environment variables for Raft node configuration. - Added a new error code for Raft node startup failure in the error definitions. This change ensures that if the Raft node cannot be configured and started, the application will terminate gracefully, preventing further execution with an invalid state.
- Changed the raft address from 127.0.0.1:2223 to 127.0.0.1:2222. - Updated the nodeID from node2 to node1. These updates are made to the test data configuration to align with the current test case requirements.
The comment above the constants was misleading, suggesting they were only command types. Updated the comment to reflect that these constants are related to Raft operations.
- Removed the unnecessary `isLeader` variable in the `monitorLeadership` function. - Directly checked the node's state against `raft.Leader` in the if condition.
Updated the `Shutdown` method in `raft.go` to gracefully handle the `ErrRaftShutdown` error. This change ensures that if the Raft node is already shut down, the error is ignored, preventing unnecessary error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your awesome contribution! 🙏 🙌
LGTM! 🚀 🎉
Ticket(s)
Implements part of #628 - Raft-based State Synchronization for GatewayD Instances
Description
This PR introduces Raft consensus for managing consistent hash state across GatewayD instances. This is the first phase of implementing Raft-based state synchronization, focusing specifically on the consistent hash load balancer state.
Key changes include:
The implementation ensures that consistent hash mappings are synchronized across all GatewayD instances, providing better consistency in load balancing decisions across the cluster.
Related PRs
N/A - This is the first PR implementing Raft consensus
make gen-docs
command.Legal Checklist