-
Notifications
You must be signed in to change notification settings - Fork 1
Scheduling Algorithms
This approach from the above paper breaks the scheduling into three subcomponents. Each site/resource has one of each of these subcomponents executing at any given time:
- Local Scheduler -- Determines execution order of jobs scheduled to run locally
- External Scheduler -- may dispatch jobs to a local scheduler or another site entirely
- Dataset Scheduler -- handles data replication/deletion
It appears that we can make a connection between a local scheduler and a Pilot executing on a given resource, and another connection between an external scheduler and a Pilot Manager. The Dataset Scheduler could correspond to pilot-data, should we decide to make that connection.
The algorithms chosen for each of these schedulers are simple but functional.
For the External Scheduler, we have:
- JobRandom: A randomly selected site.
- JobLeastLoaded: The site that currently has the least load. (A variety of definitions for load are possible; here we define it simply as the least number of jobs waiting to run.)
- JobDataPresent: A site that already has the required data. If more than one site qualifies choose the least loaded one.
- JobLocal: Always run jobs locally.
For the Dataset Scheduler, we have:
- DataDoNothing: No active replication takes place. Datasets are pre-assigned to different sites and no dynamic replication policy is in place. Data may be fetched from a remote site for a particular job, in which case it is cached and managed using LRU. A cached dataset is then available to the grid as a replica.
- DataRandom: The Dataset Scheduler keeps track of the popularity of the datasets it contains, and when the popularity exceeds a threshold those datasets are replicated to a random site on the grid.
- DataLeastLoaded: The Dataset Scheduler chooses the least loaded site from its list of known sites (we define this as neighbors) as a new host for apopular dataset.
The algorithms themselves are simple and (relatively) easily described in plain English, but I think the abstractions chosen (ie Local/External/Dataset schedulers) map well to our Pilot concepts. In addition, it touches upon many important things that we may want to keep in mind for Troy, namely:
- Data scheduling (Pilot-Data)
- Resource information (e.g. system load)
- Resource selection (external scheduler capable of sending tasks to other resources using resource information)
On top of all of this, the algorithms make a marked impact on performance (see the paper results for graphs) concerning both time-to-completion and data movement.