-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fully distributed training #252
Comments
The structure of the library facilitates some usage of distributed mechanisms, this is definitely not a CLI workflow. Some additional scripts are being added under icenet-pipeline for the moment |
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 23, 2024
…ly qualified dataset filenames and Dev icenet-ai#252: refactoring of existing training functionality to allow extension to use horovod for fully distributed training as a child implementation of the original tensorflow
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 23, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 23, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 23, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 23, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 24, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 24, 2024
…raining with horovod
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 27, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 27, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 27, 2024
…es weren't available for callbacks, not a missing call
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 27, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 27, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 28, 2024
…decay defaulting incorrectly
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 28, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 28, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 28, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
May 28, 2024
JimCircadian
added a commit
to JimCircadian/icenet
that referenced
this issue
Jun 7, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Multi-node multi-*PU training. This is required for really scaling our use of the data pipeline for big predictions and given the construction of the pipeline as it exists, we just need some library changes to ensure that we can utilise resources as they're available. This will track additional development required to ensure that we scale to the HPC capabilities in question.
The text was updated successfully, but these errors were encountered: