Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully distributed training #252

Open
JimCircadian opened this issue Apr 12, 2024 · 1 comment
Open

Fully distributed training #252

JimCircadian opened this issue Apr 12, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@JimCircadian
Copy link
Member

Description

Multi-node multi-*PU training. This is required for really scaling our use of the data pipeline for big predictions and given the construction of the pipeline as it exists, we just need some library changes to ensure that we can utilise resources as they're available. This will track additional development required to ensure that we scale to the HPC capabilities in question.

@JimCircadian JimCircadian self-assigned this Apr 12, 2024
@JimCircadian JimCircadian added this to the v0.3.0 milestone Apr 12, 2024
@JimCircadian JimCircadian added the enhancement New feature or request label Apr 12, 2024
@JimCircadian JimCircadian modified the milestones: v0.4.0, v0.3.0 Apr 15, 2024
@JimCircadian
Copy link
Member Author

The structure of the library facilitates some usage of distributed mechanisms, this is definitely not a CLI workflow. Some additional scripts are being added under icenet-pipeline for the moment

JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 23, 2024
…ly qualified dataset filenames and Dev icenet-ai#252: refactoring of existing training functionality to allow extension to use horovod for fully distributed training as a child implementation of the original tensorflow
@JimCircadian JimCircadian modified the milestones: v0.3.0, v0.2.9 May 23, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 23, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 23, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 23, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 23, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 24, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 24, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 27, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 27, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 27, 2024
…es weren't available for callbacks, not a missing call
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 27, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 27, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 28, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 28, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 28, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 28, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue May 28, 2024
JimCircadian added a commit to JimCircadian/icenet that referenced this issue Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant