Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a separate set of Sisyphus worker nodes per workflow in order to: #12

Open
5 tasks
1fish2 opened this issue Oct 7, 2019 · 2 comments
Open
5 tasks
Assignees

Comments

@1fish2
Copy link
Collaborator

1fish2 commented Oct 7, 2019

  • enable app-specific worker GCE VM parameters (RAM, disk, CPUs, GPUs, TPUs, ACLs, ...),
  • prevent large workflows from starving small ones,
  • enable staging servers to be independent,
  • allow log filtering by run,
  • save time pulling Docker images (locality),
  • prevent cascading problems like repeated-retry-on-failure from spreading between workflows.

Simplest approach:

  • A separate RabbitMQ task queue per workflow.
    • Eventually delete the workflow's task queue and remove its resources from Gaia memory.
  • In the workflow builder, launch the workers with the workflow name as metadata, and use that to find the task queue.
    • Ditto when resuming a workflow. Improve the usability somehow.

Smarter:

  • Auto-launch and shut down workers.
@1fish2 1fish2 self-assigned this Oct 7, 2019
@1fish2
Copy link
Collaborator Author

1fish2 commented Oct 7, 2019

The branch siskiyou passes the workflow name from the builder to the worker nodes.

@1fish2
Copy link
Collaborator Author

1fish2 commented Oct 7, 2019

The siskiyou PR-to-be makes the workflow builder pass the workflow name to the new worker nodes.

It also makes launch-workers.sh insist on a workflow name parameter, to catch that mistake when manually launching workers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant