-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Launch interchange as a fresh process #3463
Conversation
there are default values in the interchange code, but they are all specified in the executor code too, so these defautls will never be used. remove them as misleading. see similar changes to process worker pool, PR #2973, for more detailed justification needs to change zmq sockets test because that assumes the arbitrary defaults are present. which is no longer the case. but if you want to initialize an interchange that requires you to specify all this stuff, and want some arbitrary values, then make those arbitrary values yourself. client address parameter is now supplied by the executor - it was not before, and so the default/hard-coded value now lives in the executor, not the interchange
…l, not multiprocessing any downstream packaging will need to be aware of the presence of interchange.py as a new command-line invocable script and this might break some build instructions which do not configure installed scripts onto the path. this PR replaces keyword arguments with argparse command line parameters. it does not attempt to make those command line arguments differently-optional than the constructor of the Interchange class (for example, worker_ports and worker_port_range are both mandatory, because they are both specified before this PR) i'm somewhat uncomfortable with this seeming like an ad-hoc serialise/deserialise protocol for what was previously effecting a dict of typed python objects... but it's what process worker pool does. see issue #3373 for interchange specific issue see issue #2343 for parsl general fork vs threads issue see possibly issue #3378?
…ehaviour to pass. Not tested against a real hanging interchange though...
commit e7d18aa swaps this PR from using I did it so that people can compare the argparse approach (before this commit) and the pickle approach (after this commit) |
(@rjmello @yadudoc @khk-globus are the people that probably care about my last comment - #3463 (comment) ) |
4dc0f34
to
c9a7c2c
Compare
@khk-globus suggested using stdin rather than a command line parameter, and I have made that change. We also briefly discussed what protocol to use over stdin (primarily: pickle vs JSON). This PR sticks with pickle and I added a note about that into the PR description:
|
c9a7c2c
to
e611582
Compare
Hmm; I appear to have commented in a weird place; pulling up from the commit-comment so the PR has an easily discoverable record of it:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it. Nice work.
Per the analysis in #3495, defining the `ManagerLost` and `VersionMismatch` errors in the `interchange.py` became a problem in #3463, where the interchange now runs as `__main__`. This makes it difficult for Dill to get the serde correct. The organizational fix is simply to move these classes to an importable location, which follows the expectation that classes are available in both local and remote locations, which defining in `__main__` can't easily guarantee. Fixes: #3495
Per the analysis in #3495, defining the `ManagerLost` and `VersionMismatch` errors in the `interchange.py` became a problem in #3463, where the interchange now runs as `__main__`. This makes it difficult for Dill to get the serde correct. The organizational fix is simply to move these classes to an importable location, which follows the expectation that classes are available in both local and remote locations, which defining in `__main__` can't easily guarantee. Fixes: #3495
This was introduced in PR #3463 and at the time I incorrectly assumed that interchange exit would close both ends of the pipe. That is untrue. For example: pytest parsl/tests/test_htex/ --config local ends with 341 fds open before this PR, and 327 file descriptors open after this PR.
This was introduced in PR #3463 and at the time I incorrectly assumed that interchange exit would close both ends of the pipe. That is untrue. For example: pytest parsl/tests/test_htex/ --config local ends with 341 fds open before this PR, and 327 file descriptors open after this PR.
This was introduced in PR #2629 to guard against the submit process installing a SIGTERM handler and then that handler being unexpectedly inherited by the interchange via multiprocesssing fork PR #3463 changed the interchange to run as a fresh Python process, which will not inherit SIGTERM handlers, so since then this line has been vestigial. Fixes issue #3588
This was introduced in PR #2629 to guard against the submit process installing a SIGTERM handler and then that handler being unexpectedly inherited by the interchange via multiprocesssing fork PR #3463 changed the interchange to run as a fresh Python process, which will not inherit SIGTERM handlers, so since then this line has been vestigial. Fixes issue #3588
Prioer to PR #3463, the interchange process was launched with multiprocessing fork and inherited the log configuration of the parent process. To give the interchange its own log file, a specific logger, called "interchange" was used. In PR #2307, this logger was configured to not propagate entries upwards, so that user defined root handlers in the parent process do not see interchange logs. Since #3463 this special configuration has no longer been necessary. See #3635 for a related change to signal handlers.
Prior to PR #3463, the interchange process was launched with multiprocessing fork and inherited the log configuration of the parent process. To give the interchange its own log file, a specific logger, called "interchange" was used. In PR #2307, this logger was configured to not propagate entries upwards, so that user defined root handlers in the parent process do not see interchange logs. Since #3463 this special configuration has no longer been necessary. See #3635 for a related change to signal handlers. Co-authored-by: Kevin Hunter Kesling <[email protected]>
This PR removes a use of multiprocessing fork-without-exec.
At heart, this is how the interchange has wanted to be launched for some time (because of earlier remote interchange work).
Launching multiprocessing fork caused a bunch of problems related to inheriteing state from from the parent submitting process that go away with this (jumbled logging topics, race conditions around at least logging-while-forking, inherited signal handlers).
The configuration dictionary, previously passed in memory over a fork, is now sent in pickled form over
stdin
. Using pickle here rather than (eg.) JSON keeps the path open for sending richer configuration objects, beyond what can be encoded in JSON. This isn't something needed right now, but at least configurable monitoring radios (the immediate driving force behind this PR) are modelled around passing arbitrary configuration objects around to configure things - and so it seems likely that if interchange monitoring configuration is exposed to the user, richer objects would be passed here. See PR #3315 for monitoring radio prototype.