-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run items using SLURM; save batches with platform-independent paths #298
Conversation
Changed to draft because I'm actually still refactoring things as part of the cross-platform saving fix |
tox-dev/filelock#331 is currently a blocker... could probably sidestep but I'm hoping they fix it quickly (for now) |
This now also fixes #209 - sorry for the lack of clean separation between them. Locking the batch file while writing was involved in both sets of changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Mostly LGTM, have a few suggestions
I made some significant changes to put the saving functionality in the |
Hmm checks seem to be failing at the install step - not sure if it's due to these changes or something else is going on. |
|
||
Parameters | ||
---------- | ||
index: int, str or UUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put type annotation in function signature too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't know what the best practice is with this, but isn't it correct given that the code of this function accepts only ints? On the other hand, because of decoration by @_index_parser
, CaimanDataFrameExtensions.update_item
accepts ints, strs, and UUIDs. However, I agree it is confusing; most people probably look to function signatures rather than docstrings to learn how to call things.
have a few comments, sorry been busy with a big |
No worries, I've been working on other things as well. |
os.remove(bak) | ||
except: | ||
except BaseException as err: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes BaseException
, however here I believe it's correct (at least matches previous behavior); just need to put it explicitly in order to get err
so it can be forwarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except (Exception, KeyboardInterrupt)
@kushalkolar done making changes for now; see remaining threads for questions I still have |
os.remove(bak) | ||
except: | ||
except BaseException as err: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except (Exception, KeyboardInterrupt)
This implements the "slurm" backend via
_run_slurm
. Supports passing a partition or list of partitions on which to run the jobs; more options (such as memory allocation) could be added if we think they're important.For controlling number of CPUs/processes per job, right now I'm using the MESMERIZE_N_PROCESSES environment variable to indicate how many processes to use per job, which matches the behavior of the "subprocess" backend. However, it might be a good idea to instead divide this number by the number of jobs that are running in parallel, to avoid needlessly over-parallelizing each job. I'm doing this in my own code to set the environment variable and it works fine, but it may make sense to automate it.
This also adds a dependency on
filelock
to lock the batch file when updating it to avoid race conditions. I used theSoftFileLock
because the regularFileLock
wasn't working for me on a NTFS remote drive (from Linux). As I understand it, this basically just creates a lock file when acquiring and deletes it when releasing, which also wouldn't be hard to implement ourselves if you prefer not to add a dependency.