-
Notifications
You must be signed in to change notification settings - Fork 55
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(networking): add bootstrap cache for peer discovery
Add persistent bootstrap cache to maintain a list of previously known peers, improving network bootstrapping efficiency and reducing cold-start times. Enhance the bootstrap cache implementation with robust corruption detection and recovery mechanisms. This change ensures system resilience when the cache file becomes corrupted or invalid. Key changes: * Add explicit cache corruption detection and error reporting * Implement cache rebuilding from in-memory peers or endpoints * Use atomic file operations to prevent corruption during writes * Improve error handling with specific error variants * Add comprehensive test suite for corruption scenarios The system now handles corruption by: 1. Detecting invalid/corrupted JSON data during cache reads 2. Attempting recovery using in-memory peers if available 3. Falling back to endpoint discovery if needed 4. Using atomic operations for safe cache updates Testing: * Add tests for various corruption scenarios * Add concurrent access tests * Add file operation tests * Verify endpoint fallback behavior - Add smarter JSON format detection by checking content structure - Improve error handling with specific InvalidResponse variant - Reduce unnecessary warnings by only logging invalid multiaddrs - Simplify parsing logic to handle both JSON and plain text formats - Add better error context for failed parsing attempts All tests passing, including JSON endpoint and plain text format tests. feat(bootstrap_cache): implement circuit breaker with exponential backoff - Add CircuitBreakerConfig with configurable parameters for failures and timeouts - Implement circuit breaker states (closed, open, half-open) with state transitions - Add exponential backoff for failed request retries - Update InitialPeerDiscovery to support custom circuit breaker configuration - Add comprehensive test suite with shorter timeouts for faster testing This change improves system resilience by preventing cascading failures and reducing load on failing endpoints through intelligent retry mechanisms.
- Loading branch information
Showing
19 changed files
with
4,006 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,8 +36,7 @@ sn_node_manager/.vagrant | |
.venv/ | ||
uv.lock | ||
*.so | ||
*.pyc | ||
|
||
*.pyc | ||
*.swp | ||
|
||
/vendor/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
[package] | ||
name = "bootstrap_cache" | ||
version = "0.1.0" | ||
edition = "2021" | ||
license = "GPL-3.0" | ||
authors = ["MaidSafe Developers <[email protected]>"] | ||
description = "Bootstrap cache functionality for the Safe Network" | ||
|
||
[dependencies] | ||
chrono = { version = "0.4", features = ["serde"] } | ||
dirs = "5.0" | ||
fs2 = "0.4.3" | ||
libp2p = { version = "0.53", features = ["serde"] } | ||
reqwest = { version = "0.11", features = ["json"] } | ||
serde = { version = "1.0", features = ["derive"] } | ||
serde_json = "1.0" | ||
tempfile = "3.8.1" | ||
thiserror = "1.0" | ||
tokio = { version = "1.0", features = ["full", "sync"] } | ||
tracing = "0.1" | ||
|
||
[dev-dependencies] | ||
wiremock = "0.5" | ||
tokio = { version = "1.0", features = ["full", "test-util"] } | ||
tracing-subscriber = { version = "0.3", features = ["env-filter"] } |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,216 @@ | ||
# Bootstrap Cache | ||
|
||
A decentralized peer discovery and caching system for the Safe Network. | ||
|
||
## Features | ||
|
||
- **Decentralized Design**: No dedicated bootstrap nodes required | ||
- **Cross-Platform Support**: Works on Linux, macOS, and Windows | ||
- **Shared Cache**: System-wide cache file accessible by both nodes and clients | ||
- **Concurrent Access**: File locking for safe multi-process access | ||
- **Atomic Operations**: Safe cache updates using atomic file operations | ||
- **Initial Peer Discovery**: Fallback web endpoints for new/stale cache scenarios | ||
- **Comprehensive Error Handling**: Detailed error types and logging | ||
- **Circuit Breaker Pattern**: Intelligent failure handling with: | ||
- Configurable failure thresholds and reset timeouts | ||
- Exponential backoff for failed requests | ||
- Automatic state transitions (closed → open → half-open) | ||
- Protection against cascading failures | ||
|
||
### Peer Management | ||
|
||
The bootstrap cache implements a robust peer management system: | ||
|
||
- **Peer Status Tracking**: Each peer's connection history is tracked, including: | ||
- Success count: Number of successful connections | ||
- Failure count: Number of failed connection attempts | ||
- Last seen timestamp: When the peer was last successfully contacted | ||
|
||
- **Automatic Cleanup**: The system automatically removes unreliable peers: | ||
- Peers that fail 3 consecutive connection attempts are marked for removal | ||
- Removal only occurs if there are at least 2 working peers available | ||
- This ensures network connectivity is maintained even during temporary connection issues | ||
|
||
- **Duplicate Prevention**: The cache automatically prevents duplicate peer entries: | ||
- Same IP and port combinations are only stored once | ||
- Different ports on the same IP are treated as separate peers | ||
|
||
## Installation | ||
|
||
Add this to your `Cargo.toml`: | ||
|
||
```toml | ||
[dependencies] | ||
bootstrap_cache = { version = "0.1.0" } | ||
``` | ||
|
||
## Usage | ||
|
||
### Basic Example | ||
|
||
```rust | ||
use bootstrap_cache::{BootstrapCache, CacheManager, InitialPeerDiscovery}; | ||
|
||
#[tokio::main] | ||
async fn main() -> Result<(), Box<dyn std::error::Error>> { | ||
// Initialize the cache manager | ||
let cache_manager = CacheManager::new()?; | ||
|
||
// Try to read from the cache | ||
let mut cache = match cache_manager.read_cache() { | ||
Ok(cache) if !cache.is_stale() => cache, | ||
_ => { | ||
// Cache is stale or unavailable, fetch initial peers | ||
let discovery = InitialPeerDiscovery::new(); | ||
let peers = discovery.fetch_peers().await?; | ||
let cache = BootstrapCache { | ||
last_updated: chrono::Utc::now(), | ||
peers, | ||
}; | ||
cache_manager.write_cache(&cache)?; | ||
cache | ||
} | ||
}; | ||
|
||
println!("Found {} peers in cache", cache.peers.len()); | ||
Ok(()) | ||
} | ||
``` | ||
|
||
### Custom Endpoints | ||
|
||
```rust | ||
use bootstrap_cache::InitialPeerDiscovery; | ||
|
||
let discovery = InitialPeerDiscovery::with_endpoints(vec![ | ||
"http://custom1.example.com/peers.json".to_string(), | ||
"http://custom2.example.com/peers.json".to_string(), | ||
]); | ||
``` | ||
|
||
### Circuit Breaker Configuration | ||
|
||
```rust | ||
use bootstrap_cache::{InitialPeerDiscovery, CircuitBreakerConfig}; | ||
use std::time::Duration; | ||
|
||
// Create a custom circuit breaker configuration | ||
let config = CircuitBreakerConfig { | ||
max_failures: 5, // Open after 5 failures | ||
reset_timeout: Duration::from_secs(300), // Wait 5 minutes before recovery | ||
min_backoff: Duration::from_secs(1), // Start with 1 second backoff | ||
max_backoff: Duration::from_secs(60), // Max backoff of 60 seconds | ||
}; | ||
|
||
// Initialize discovery with custom circuit breaker config | ||
let discovery = InitialPeerDiscovery::with_config(config); | ||
``` | ||
|
||
### Peer Management Example | ||
|
||
```rust | ||
use bootstrap_cache::BootstrapCache; | ||
|
||
let mut cache = BootstrapCache::new(); | ||
|
||
// Add a new peer | ||
cache.add_peer("192.168.1.1".to_string(), 8080); | ||
|
||
// Update peer status after connection attempts | ||
cache.update_peer_status("192.168.1.1", 8080, true); // successful connection | ||
cache.update_peer_status("192.168.1.1", 8080, false); // failed connection | ||
|
||
// Clean up failed peers (only if we have at least 2 working peers) | ||
cache.cleanup_failed_peers(); | ||
``` | ||
|
||
## Cache File Location | ||
|
||
The cache file is stored in a system-wide location accessible to all processes: | ||
|
||
- **Linux**: `/var/safe/bootstrap_cache.json` | ||
- **macOS**: `/Library/Application Support/Safe/bootstrap_cache.json` | ||
- **Windows**: `C:\ProgramData\Safe\bootstrap_cache.json` | ||
|
||
## Cache File Format | ||
|
||
```json | ||
{ | ||
"last_updated": "2024-02-20T15:30:00Z", | ||
"peers": [ | ||
{ | ||
"ip": "192.168.1.1", | ||
"port": 8080, | ||
"last_seen": "2024-02-20T15:30:00Z", | ||
"success_count": 10, | ||
"failure_count": 0 | ||
} | ||
] | ||
} | ||
``` | ||
|
||
## Error Handling | ||
|
||
The crate provides detailed error types through the `Error` enum: | ||
|
||
```rust | ||
use bootstrap_cache::Error; | ||
|
||
match cache_manager.read_cache() { | ||
Ok(cache) => println!("Cache loaded successfully"), | ||
Err(Error::CacheStale) => println!("Cache is stale"), | ||
Err(Error::CacheCorrupted) => println!("Cache file is corrupted"), | ||
Err(Error::Io(e)) => println!("IO error: {}", e), | ||
Err(e) => println!("Other error: {}", e), | ||
} | ||
``` | ||
|
||
## Thread Safety | ||
|
||
The cache system uses file locking to ensure safe concurrent access: | ||
|
||
- Shared locks for reading | ||
- Exclusive locks for writing | ||
- Atomic file updates using temporary files | ||
|
||
## Development | ||
|
||
### Building | ||
|
||
```bash | ||
cargo build | ||
``` | ||
|
||
### Running Tests | ||
|
||
```bash | ||
cargo test | ||
``` | ||
|
||
### Running with Logging | ||
|
||
```rust | ||
use tracing_subscriber::FmtSubscriber; | ||
|
||
// Initialize logging | ||
let subscriber = FmtSubscriber::builder() | ||
.with_max_level(tracing::Level::DEBUG) | ||
.init(); | ||
``` | ||
|
||
## Contributing | ||
|
||
1. Fork the repository | ||
2. Create your feature branch (`git checkout -b feature/amazing-feature`) | ||
3. Commit your changes (`git commit -am 'Add amazing feature'`) | ||
4. Push to the branch (`git push origin feature/amazing-feature`) | ||
5. Open a Pull Request | ||
|
||
## License | ||
|
||
This project is licensed under the GPL-3.0 License - see the LICENSE file for details. | ||
|
||
## Related Documentation | ||
|
||
- [Bootstrap Cache PRD](docs/bootstrap_cache_prd.md) | ||
- [Implementation Guide](docs/bootstrap_cache_implementation.md) |
Oops, something went wrong.