_____ _______ ____ _____ ____ ____
\__ \\_ __ \_/ ___\\__ \ / \_/ __ \
/ __ \| | \/\ \___ / __ \| | \ ___/
(____ /__| \___ >____ /___| /\___ > /\
\/ \/ \/ \/ \/ \/
Arcane is a powerful CLI tool designed to simplify and streamline the process of distributed training for machine learning models. It allows you to efficiently manage and monitor training tasks across multiple machines or GPUs.
- Distributed Training: Seamlessly distribute training tasks across multiple nodes.
- Easy Configuration: Simple setup with configuration files and environment variables.
- Resource Monitoring: Track resource usage and performance metrics.
- Scalability: Easily scale your training tasks to handle large datasets and complex models.
To install Arcane, clone the repository and use the following command:
git clone https://github.com/yourusername/arcane.git
cd arcane
pip install .
Arcane provides a command-line interface for managing distributed training tasks. Here are some common commands:
-
Start Training: Begin a distributed training session.
arcane train --config path/to/config.yaml
-
Monitor Progress: Check the status of your training tasks.
arcane status
-
Stop Training: Terminate a running training session.
arcane stop
Arcane uses a YAML configuration file to specify training parameters and machine roles. Here is an example configuration:
master:
host: master-node
port: 12345
workers:
- host: worker-node-1
port: 12346
- host: worker-node-2
port: 12347
training:
model: resnet50
dataset: /path/to/dataset
epochs: 10
batch_size: 32
To test Arcane across multiple machines:
- Ensure all machines are on the same network and have SSH access.
- Set up the environment and dependencies on each machine.
- Use the sample configuration file to start a distributed training task.
- Monitor logs and resource usage to ensure everything is functioning correctly.
We welcome contributions! Please read our contributing guidelines for more details.
PS: This is still under construction
This project is licensed under the MIT License - see the LICENSE file for details.