When correctly configured, two 5Genesis platforms can perform the execution of a distributed experiment, in which both platforms execute tasks in a coordinated manner and exchange information with each other. In order to use this functionality, the following conditions must be met:
- On each platform, a test case that defines the set of actions (including any necessary coordination) of that side exists.
- The East/West interface of the ELCM in both sides is enabled, there is connectivity between the two instances and connection details for the remote side's ELCM are defined.
- The remote platforms are registered in the Dispatcher of both sides (see the Dispatcher documentation).
Optionally, in order to ease the creation of a valid experiment descriptor:
- The East/West interface of the Portal in both sides is enabled, there is connectivity between the two instances and connection details for the remote side's Portal are defined.
The creation of a distributed experiment is a collaborative activity between the two platforms involved in the execution of the experiment. Each platform is responsible for the definition of their set of actions, as only they have the required knowledge on the usage of their equipment, but must agree with the other platform's administrators about any necessary coordination and information exchange that is required in order to successfully execute the test case.
The actual definition of the test case is very similar to that of a normal (non-distributed) experiment, but with the following differences:
- The test case definition yaml must include an additional key:
Distributed: True
- A distributed experiment cannot be
Custom
(i.e. cannot defineParameters
) - Additional task types are available (for coordination and information exchange)
The general workflow during a distributed experiment is as follows:
- The Dispatcher of one of the platforms (the
Main
platform) receives a distributed experiment execution request, either from the Portal or through the Open APIs. - The Dispatcher performs the initial coordination, contacting with the ELCM of its own platform and the Dispatcher
of the remote platform (the
Secondary
platform). - Once the initial coordination is completed, the ELCM on both sides communicate directly for the rest of the experiment execution.
- Each side performs the execution of their tasks as normal, unless they reach a point where they must coordinate:
- If one of the platforms must wait until the remote side has performed some actions:
- The waiting platform can use the
Remote.WaitForMilestone
task. - The other platform can indicate that the actions have been performed using the
Run.AddMilestone
task.
- The waiting platform can use the
- If one of the platforms requires certain information from the remote side:
- The querying platform can use the
Remote.GetValue
task. - The other platform can set the value requested using any of the
Run.Publish
,Run.PublishFromFile
andRun.PublishFromPreviousTaskLog
tasks.
- The querying platform can use the
- If one of the platforms must wait until the remote side has performed some actions:
- Once both platforms execute all their tasks, the
Main
platform requests all the generated files and results to theSecondary
platform, so that they are saved along with the ones generated by theMain
and available to the experimenter.
Halts the execution of additional tasks until the remote side specifies that a certain milestone has been reached
(using the Run.AddMilestone
task). Configuration values:
Milestone
: Name of the milestone to wait for.Timeout
: Custom timeout for this particular request. If not specified, the value configured in the East/West section of the configuration is used.
Init
,PreRun
,Run
,PostRun
,Finished
,Cancelled
andErrored
are valid milestone names that are automatically added (if/when reached) in all experiment executions.
Halts the execution of additional tasks until a certain value can be obtained from the remote side (using any of the
Run.Publish
, Run.PublishFromFile
and Run.PublishFromPreviousTaskLog
tasks). When received, the value will be
published internally and available for variable expansion. Configuration values:
Value
: Name of the value to request.PublishName
: Name to use when publishing the value. If not specified the sameValue
name will be used.Timeout
: Custom timeout for this particular request. If not specified, the value configured in the East/West section of the configuration is used.