Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stop mechanism when FahAsynchronousComputeService exhausts all RUNs in a PROJECT, or all CLONEs in a RUN #15

Open
dotsdl opened this issue Aug 26, 2024 · 2 comments

Comments

@dotsdl
Copy link
Member

dotsdl commented Aug 26, 2024

We currently don't have an explicit stop mechanism in place for the FahAsynchronousComputeService for when it exhausts all $2^{16}$ RUNs in a PROJECT, or all $2^{16}$ CLONEs in a RUN. It remains an assumption that work server will refuse to create a new RUN or CLONE if it cannot, but I suspect this is not something we should be depending on. Currently the behavior of FahAsynchronousComputeService under these conditions remains undefined.

Possible behaviors we could implement for the FahAsynchronousComputeService in the case of any exhaustion include:

  1. The entire service could halt, indicating that it cannot continue with its current set of PROJECTs as configured, requiring administrator intervention.
  2. A cascading approach to using what it can, up to a limit:
    • If the CLONEs within a RUN are exhausted, the service could create a new RUN in the same PROJECT and start populating it. This complicates the current model of a RUN corresponding to a Transformation, since that mapping will no longer be one-to-one, but potentially many-to-one, requiring to changes in how the service maintains its index.
    • If the RUNs within a PROJECT are exhausted, the next closest PROJECT with a configuration suitable for the given Task could be used. This has the downside that over time there will be drift between the points offered by a PROJECT and the effort required for the Tasks it services, with a wider variance in effort over time in remaining PROJECTs until they are all exhausted.
    • If all PROJECTs configured for the service have exhausted their RUNs, the service should halt, indicating that it cannot continue, requiring administrator intervention.
  3. ...

There may be additional alternatives.

@jchodera, @sukritsingh, @jcoffland: do you have insights as to what may be most appropriate here, or ideas for a third alternative?

@sukritsingh
Copy link

I'm a fan of "simpler is better" with stuff like this - some of my initial thoughts below:

  1. $2^{16}$ is a massive number of RUNs or CLONEs for any single project. Assuming each RUN is a unique transformation, do we foresee this being an issue rapidly (ie within a few months of deployment?)
  2. I think moving away from the one-to-one mapping of each RUN corresponding to a Transformation has the potential to introduce a lot of confusion for a user, so I'd want to see more detail on it before being convinced about that as a viable option.
  3. Migrating RUNs between PROJECTs with different point calculations has the potential to get a lot of complaints from testers about variable effort and inconsistent effort for the same project ID, and I'd like to not deal with that kind of complaint as I'm sure others would, so I'd want to avoid that as much as possible.
  4. What about just migrating to a new project ID with the same point value? I suppose automatically creating new project IDs could be dangerous so maybe the safe move here is to "halt" the service until and administrator gets involved. $2^{16}$ is such a large number I think it'd be good to know how often an administrator would need to spin up a new project....

My brain is in a few directions with faculty applications and other writing tasks right now, so will percolate on this further!

@dotsdl
Copy link
Member Author

dotsdl commented Nov 12, 2024

I've added hard stop guardrails (option 1) in 847decb. This should at least avoid potential disaster, and will allow us to explore more sophisticated solutions later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants