ml_scheduler

ML Scheduler is a lightweight machine learning experiment scheduler that automates resource management (e.g., GPUs and models) and batch runs experiments with just a few lines of Python code.

Step by step tutorial: what you schedule is what you get

Quick Start

Install ml-scheduler

pip install ml-scheduler

or install from the github repository:

git clone https://github.com/huyiwen/ml_scheduler
cd ml_scheduler
pip install -e .

Create a Python script:

cuda = ml_scheduler.pools.CUDAPool([0, 2], 90)
disk = ml_scheduler.pools.DiskPool('/one-fs')


@ml_scheduler.exp_func
async def mmlu(exp: ml_scheduler.Exp, model, checkpoint):

    source_dir = f"/another-fs/model/{model}/checkpoint-{checkpoint}"
    target_dir = f"/one-fs/model/{model}-{checkpoint}"

    # resources will be cleaned up after exiting the function
    disk_resource = await exp.get(
        disk.copy_folder,
        source_dir,
        target_dir,
        cleanup_target=True,
    )
    cuda_resource = await exp.get(cuda.allocate, 1)

    # run inference
    args = [
        "python", "inference.py", "--model", target_dir, "--dataset", "mmlu", "--cuda",  str(cuda_resource[0])
    ]
    stdout = await exp.run(args=args)
    await exp.report({'Accuracy', stdout})


mmlu.run_csv("experiments.csv", ['Accuracy'])

Mark the function with @ml_scheduler.exp_func and async to make it an experiment function. The function should take an exp argument as the first argument.

Then use await exp.get to get resources (non-blocking) and await exp.run to run the experiment (also non-blocking). Non-blocking means that when you can run multiple experiments concurrently.

Create a CSV file experiments.csv with your arguments (model and checkpoint in this case):

model,checkpoint
alpacaflan-packing,200
alpacaflan-packing,400
alpacaflan-qlora,200-merged
alpacaflan-qlora,400-merged

Run the script:

python run.py

The results (Accuracy in this case) and some other information will be saved in results.csv.

More Examples

Copy and run

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
docs/contributing		docs/contributing
examples		examples
ml_scheduler		ml_scheduler
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ml_scheduler

Quick Start

More Examples

About

Releases

Packages

Languages

License

huyiwen/ml_scheduler

Folders and files

Latest commit

History

Repository files navigation

ml_scheduler

Quick Start

More Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages