Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep transport open for quick tasks #6595

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

GeigerJ2
Copy link
Contributor

@GeigerJ2 GeigerJ2 commented Oct 29, 2024

EDIT: For now, restricting this to handling only the first task, by keeping track of the new _last_close_time attribute, and submitting a job immediately if this is None.

transports.py

  • Add _last_request_special attribute to TransportQueue to track special requests that didn't respect safe_open_interval, to not overload SSH.
  • Add _last_open_time and _last_close_time to keep timing information when transports were opened/closed, same reason as above -> Use this timing information to evaluate when an open callback can be scheduled (immediately, after time difference between last open and safe_open_interval, or after the full safe_open_interval (as done so far))
  • Move transport closing logic out of transports.py into tasks.py, as otherwise we cannot reuse opened transports, as they are always closed after the task for which the request was created has finished, prohibiting the use of one transport for multiple tasks that finish quickly (to be thought about, as not always closing the transport might have undesired side effects...)
  • Add open_transports attribute to TransportQueue similar to the transport_requests dictionary, with the pk of the AuthInfo instance as key and the transport as value. This can be accessed outside, in tasks.py to retrieve existing, open transports.

tasks.py

  • In the different do_ methods, see if an existing open transport for the AuthInfo contained in TransportQueue._open_transports, and use it. If not, create new TransportRequest.
  • Add closing logic here for the open transports (rather than after every TransportRequest to be able to re-use an open Transport). Logic for when they should be closed? When to pop the TransportRequests and open transports? After every successful re-use of a Transport? Would be nice if an open transport could be used for "arbitrarily" many tasks that finish quickly.
  • Don't modify do_update for now, as this is done in batches to poll multiple jobs (isn't that for the whole TransportQueue the case, at least according to the file docstring?)

Notes from discussion with @khsrali and @unkcpz

  • Call to execmanager.upload_calculation in task_upload_job is actually blocking, while task_upload_job and do_upload defined inside are async -> Possibly Ali's async PR could fix the issue, but likely not, as every task_ function still requests an individual transport
  • Because of this, close is being called in every request_transport call for every transport task
  • One possibility: Would be to have a pool of open transports from which those can be retrieved if needed, rather than creating additional requests -> Logic for when to close these open transports?
  • Alternatively, have the transport tasks batched, and only close once the whole batch once the full set of transport tasks is finished
  • In any case, close should also be run asynchronously via a callback_handle -> Though, then, how is the closing being handled? Once the close callback is scheduled, Transport might be closed at a later stage while it might actually be used

Otherwise we _always_ close the transport after requesting it??
@GeigerJ2 GeigerJ2 changed the title Add _special requests_ to keep transport open for quick tasks Keep transport open for quick tasks Oct 29, 2024
@GeigerJ2 GeigerJ2 force-pushed the feature/ssh-connection-alive branch 2 times, most recently from b3c13d0 to d0e4777 Compare November 4, 2024 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant