Scheduling Algorithms

Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications - Paper Link

High Level Overview

This approach from the above paper breaks the scheduling into three subcomponents. Each site/resource has one of each of these subcomponents executing at any given time:

Local Scheduler -- Determines execution order of jobs scheduled to run locally
External Scheduler -- may dispatch jobs to a local scheduler or another site entirely
Dataset Scheduler -- handles data replication/deletion

It appears that we can make a connection between a local scheduler and a Pilot executing on a given resource, and another connection between an external scheduler and a Pilot Manager. The Dataset Scheduler could correspond to pilot-data, should we decide to make that connection.

The algorithms chosen for each of these schedulers are simple but functional.

For the External Scheduler, we have:

JobRandom: A randomly selected site.
JobLeastLoaded: The site that currently has the least load. (A variety of definitions for load are possible; here we define it simply as the least number of jobs waiting to run.)
JobDataPresent: A site that already has the required data. If more than one site qualifies choose the least loaded one.
JobLocal: Always run jobs locally.

For the Dataset Scheduler, we have:

DataDoNothing: No active replication takes place. Datasets are pre-assigned to different sites and no dynamic replication policy is in place. Data may be fetched from a remote site for a particular job, in which case it is cached and managed using LRU. A cached dataset is then available to the grid as a replica.
DataRandom: The Dataset Scheduler keeps track of the popularity of the datasets it contains, and when the popularity exceeds a threshold those datasets are replicated to a random site on the grid.
DataLeastLoaded: The Dataset Scheduler chooses the least loaded site from its list of known sites (we define this as neighbors) as a new host for apopular dataset.

Why this could be relevant

The algorithms themselves are simple and (relatively) easily described in plain English, but I think the abstractions chosen (ie Local/External/Dataset schedulers) map well to our Pilot concepts. In addition, it touches upon many important things that we may want to keep in mind for Troy, namely:

Data scheduling (Pilot-Data)
Resource information (e.g. system load)
Resource selection (external scheduler capable of sending tasks to other resources using resource information)

On top of all of this, the algorithms make a marked impact on performance (see the paper results for graphs) concerning both time-to-completion and data movement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduling Algorithms

Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications - Paper Link

High Level Overview

Why this could be relevant

Clone this wiki locally