Skip to content

Commit

Permalink
document ampc framework
Browse files Browse the repository at this point in the history
  • Loading branch information
mikkeldenker committed Dec 4, 2024
1 parent 558f58e commit fe10bc0
Show file tree
Hide file tree
Showing 8 changed files with 48 additions and 1 deletion.
2 changes: 2 additions & 0 deletions crates/core/src/ampc/coordinator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ use super::{DhtConn, Finisher, Job, JobScheduled, RemoteWorker, Setup, Worker, W
use crate::{distributed::retry_strategy::ExponentialBackoff, Result};
use anyhow::anyhow;

/// A coordinator is responsible for scheduling jobs on workers and coordinating
/// between rounds of computation.
pub struct Coordinator<J>
where
J: Job,
Expand Down
2 changes: 1 addition & 1 deletion crates/core/src/ampc/dht/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
//! with multiple shards. Each shard cluster
//! is a Raft cluster, and each key is then routed to the correct
//! cluster based on hash(key) % number_of_shards. The keys
//! are currently *not* rebalanced if the number of shards change, so
//! are currently *not* re-balanced if the number of shards change, so
//! if an entire shard becomes unavailable or a new shard is added, all
//! keys in the entire DHT is essentially lost as the
//! keys might hash incorrectly.
Expand Down
1 change: 1 addition & 0 deletions crates/core/src/ampc/dht/store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
//
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>

use std::collections::BTreeMap;
use std::fmt::Debug;
use std::io::Cursor;
Expand Down
2 changes: 2 additions & 0 deletions crates/core/src/ampc/finisher.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

use super::prelude::Job;

/// A finisher is responsible for determining if the computation is finished
/// or if another round of computation is needed.
pub trait Finisher {
type Job: Job;

Expand Down
1 change: 1 addition & 0 deletions crates/core/src/ampc/mapper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

use super::{prelude::Job, DhtConn};

/// A mapper is the specific computation to be run on the graph.
pub trait Mapper: bincode::Encode + bincode::Decode + Send + Sync + Clone {
type Job: Job<Mapper = Self>;

Expand Down
27 changes: 27 additions & 0 deletions crates/core/src/ampc/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,33 @@
// You should have received a copy of the GNU Affero General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.

//! # Framework for Adaptive Massively Parallel Computation (AMPC).
//!
//! AMPC is a system for implementing large-scale distributed graph algorithms efficiently.
//! It provides a framework for parallel computation across clusters of machines.
//!
//! While similar in concept to MapReduce, AMPC uses a distributed hash table (DHT) as its
//! underlying data structure rather than the traditional map and reduce phases. This key
//! architectural difference enables more flexible and efficient computation patterns.
//!
//! The main advantage over MapReduce is that workers can dynamically access any keys in
//! the DHT during computation. This is in contrast to MapReduce where the keyspace must
//! be statically partitioned between reducers before computation begins. The dynamic
//! access pattern allows for more natural expression of graph algorithms in a distributed
//! setting.
//!
//! This is roughly inspired by
//! [Massively Parallel Graph Computation: From Theory to Practice](https://research.google/blog/massively-parallel-graph-computation-from-theory-to-practice/)
//!
//! ## Key concepts
//!
//! * **DHT**: A distributed hash table is used to store the result of the computation for
//! each round.
//! * **Worker**: A worker owns a subset of the overall graph and is responsible for
//! executing mappers on its portion of the graph and sending results to the DHT.
//! * **Mapper**: A mapper is the specific computation to be run on the graph.
//! * **Coordinator**: The coordinator is responsible for scheduling the jobs on the workers.
use self::{job::Job, worker::WorkerRef};
use crate::distributed::sonic;

Expand Down
12 changes: 12 additions & 0 deletions crates/core/src/ampc/setup.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,24 @@

use super::DhtConn;

/// A setup is responsible for initializing the DHT before each round of computation.
pub trait Setup {
type DhtTables;

/// Setup initial state of the DHT.
fn init_dht(&self) -> DhtConn<Self::DhtTables>;

/// Setup state for a new round.
///
/// This is called once for each round of computation.
/// The first round will run `setup_first_round` first
/// but will still call `setup_round` after that.
#[allow(unused_variables)] // reason = "dht might be used by implementors"
fn setup_round(&self, dht: &Self::DhtTables) {}

/// Setup state for the first round.
///
/// This is called once before the first round of computation.
fn setup_first_round(&self, dht: &Self::DhtTables) {
self.setup_round(dht);
}
Expand Down
2 changes: 2 additions & 0 deletions crates/core/src/ampc/worker.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ use crate::Result;
use anyhow::anyhow;
use tokio::net::ToSocketAddrs;

/// A worker is responsible for executing a mapper on its portion of the graph and
/// sending results to the DHT.
pub trait Worker: Send + Sync {
type Remote: RemoteWorker<Job = Self::Job>;

Expand Down

0 comments on commit fe10bc0

Please sign in to comment.