Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py #794

Merged
merged 6 commits into from
Oct 16, 2024
Merged

py #794

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
- sections:
- local: quickstart_spaces
title: Train on Spaces
- local: quickstart_py
title: Python SDK
- local: quickstart
title: Train Locally
- local: config
Expand Down
8 changes: 8 additions & 0 deletions docs/source/faq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@ You can safely remove the dataset from the Hub after training is complete.
If uploaded, the dataset will be stored in your Hugging Face account as a private repository and will only be accessible by you
and the training process. It is not used once the training is complete.

## My training space paused for no reason mid-training

AutoTrain Training Spaces will pause itself after training is done (or failed). This is done to save resources and costs.
If your training failed, you can still see the space logs and find out what went wrong. Note: you won't be able to retrive the logs if you restart the space.

Another reason for the space to pause is if the space is space's sleep time kicking in. If you have a long running training job, you must set the sleep time to a much higher value.
The space will anyways pause itself after the training is done thus saving you costs.

## I get error `Your installed package nvidia-ml-py is corrupted. Skip patch functions`

This error can be safely ignored. It is a warning from the `nvitop` library and does not affect the functionality of AutoTrain.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/quickstart.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Quickstart
# Quickstart Guide for Local Training

This quickstart is for local installation and usage.
If you want to use AutoTrain on Hugging Face Spaces, please refer to the *AutoTrain on Hugging Face Spaces* section.
Expand Down
111 changes: 111 additions & 0 deletions docs/source/quickstart_py.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Quickstart with Python

AutoTrain is a library that allows you to train state of the art models on Hugging Face Spaces, or locally.
It provides a simple and easy-to-use interface to train models for various tasks like llm finetuning, text classification,
image classification, object detection, and more.

In this quickstart guide, we will show you how to train a model using AutoTrain in Python.

## Getting Started

AutoTrain can be installed using pip:

```bash
$ pip install autotrain-advanced
```

The example code below shows how to finetune an LLM model using AutoTrain in Python:

```python
import os

from autotrain.params import LLMTrainingParams
from autotrain.project import AutoTrainProject


params = LLMTrainingParams(
model="meta-llama/Llama-3.2-1B-Instruct",
data_path="HuggingFaceH4/no_robots",
chat_template="tokenizer",
text_column="messages",
train_split="train",
trainer="sft",
epochs=3,
batch_size=1,
lr=1e-5,
peft=True,
quantization="int4",
target_modules="all-linear",
padding="right",
optimizer="paged_adamw_8bit",
scheduler="cosine",
gradient_accumulation=8,
mixed_precision="bf16",
merge_adapter=True,
project_name="autotrain-llama32-1b-finetune",
log="tensorboard",
push_to_hub=True,
username=os.environ.get("HF_USERNAME"),
token=os.environ.get("HF_TOKEN"),
)


backend = "local"
project = AutoTrainProject(params=params, backend=backend, process=True)
project.create()
```

In this example, we are finetuning the `meta-llama/Llama-3.2-1B-Instruct` model on the `HuggingFaceH4/no_robots` dataset.
We are training the model for 3 epochs with a batch size of 1 and a learning rate of `1e-5`.
We are using the `paged_adamw_8bit` optimizer and the `cosine` scheduler.
We are also using mixed precision training with a gradient accumulation of 8.
The final model will be pushed to the Hugging Face Hub after training.

To train the model, run the following command:

```bash
$ export HF_USERNAME=<your-hf-username>
$ export HF_TOKEN=<your-hf-write-token>
$ python train.py
```

This will create a new project directory with the name `autotrain-llama32-1b-finetune` and start the training process.
Once the training is complete, the model will be pushed to the Hugging Face Hub.

Your HF_TOKEN and HF_USERNAME are only required if you want to push the model or if you are accessing a gated model or dataset.

## AutoTrainProject Class

[[autodoc]] project.AutoTrainProject

## Parameters

### Text Tasks

[[autodoc]] trainers.clm.params.LLMTrainingParams

[[autodoc]] trainers.sent_transformers.params.SentenceTransformersParams

[[autodoc]] trainers.seq2seq.params.Seq2SeqParams

[[autodoc]] trainers.token_classification.params.TokenClassificationParams

[[autodoc]] trainers.extractive_question_answering.params.ExtractiveQuestionAnsweringParams

[[autodoc]] trainers.text_classification.params.TextClassificationParams

[[autodoc]] trainers.text_regression.params.TextRegressionParams

### Image Tasks

[[autodoc]] trainers.image_classification.params.ImageClassificationParams

[[autodoc]] trainers.image_regression.params.ImageRegressionParams

[[autodoc]] trainers.object_detection.params.ObjectDetectionParams

[[autodoc]] trainers.dreambooth.params.DreamBoothTrainingParams

### Tabular Tasks

[[autodoc]] trainers.tabular.params.TabularParams
5 changes: 5 additions & 0 deletions docs/source/tasks/sentence_transformer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,8 @@ For `qa` training, the data should be in the following format:
| how are you | I am fine |
| What is your name? | My name is Abhishek |
| Which is the best programming language? | Python |


## Parameters

[[autodoc]] trainers.sent_transformers.params.SentenceTransformersParams
36 changes: 36 additions & 0 deletions notebooks/python_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import os

from autotrain.params import LLMTrainingParams
from autotrain.project import AutoTrainProject


params = LLMTrainingParams(
model="meta-llama/Llama-3.2-1B-Instruct",
data_path="HuggingFaceH4/no_robots",
chat_template="tokenizer",
text_column="messages",
train_split="train",
trainer="sft",
epochs=3,
batch_size=1,
lr=1e-5,
peft=True,
quantization="int4",
target_modules="all-linear",
padding="right",
optimizer="paged_adamw_8bit",
scheduler="cosine",
gradient_accumulation=8,
mixed_precision="bf16",
merge_adapter=True,
project_name="autotrain-llama32-1b-finetune",
log="tensorboard",
push_to_hub=False,
username=os.environ.get("HF_USERNAME"),
token=os.environ.get("HF_TOKEN"),
)


backend = "local"
project = AutoTrainProject(params=params, backend=backend, process=True)
project.create()
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ max-line-length = 119
per-file-ignores =
# imported but unused
__init__.py: F401, E402
src/autotrain/params.py: F401
exclude =
.git,
.venv,
Expand Down
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_dreambooth.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from autotrain import logger
from autotrain.cli import BaseAutoTrainCommand
from autotrain.cli.utils import common_args, dreambooth_munge_data
from autotrain.cli.utils import common_args
from autotrain.project import AutoTrainProject
from autotrain.trainers.dreambooth.params import DreamBoothTrainingParams
from autotrain.trainers.dreambooth.utils import VALID_IMAGE_EXTENSIONS, XL_MODELS
Expand Down Expand Up @@ -387,7 +387,6 @@ def __init__(self, args):
def run(self):
logger.info("Running DreamBooth Training")
params = DreamBoothTrainingParams(**vars(self.args))
params = dreambooth_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_extractive_qa.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from argparse import ArgumentParser

from autotrain import logger
from autotrain.cli.utils import ext_qa_munge_data, get_field_info
from autotrain.cli.utils import get_field_info
from autotrain.project import AutoTrainProject
from autotrain.trainers.extractive_question_answering.params import ExtractiveQuestionAnsweringParams

Expand Down Expand Up @@ -100,7 +100,6 @@ def run(self):
logger.info("Running Extractive Question Answering")
if self.args.train:
params = ExtractiveQuestionAnsweringParams(**vars(self.args))
params = ext_qa_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_image_classification.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from argparse import ArgumentParser

from autotrain import logger
from autotrain.cli.utils import get_field_info, img_clf_munge_data
from autotrain.cli.utils import get_field_info
from autotrain.project import AutoTrainProject
from autotrain.trainers.image_classification.params import ImageClassificationParams

Expand Down Expand Up @@ -108,7 +108,6 @@ def run(self):
logger.info("Running Image Classification")
if self.args.train:
params = ImageClassificationParams(**vars(self.args))
params = img_clf_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_image_regression.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from argparse import ArgumentParser

from autotrain import logger
from autotrain.cli.utils import get_field_info, img_reg_munge_data
from autotrain.cli.utils import get_field_info
from autotrain.project import AutoTrainProject
from autotrain.trainers.image_regression.params import ImageRegressionParams

Expand Down Expand Up @@ -108,7 +108,6 @@ def run(self):
logger.info("Running Image Regression")
if self.args.train:
params = ImageRegressionParams(**vars(self.args))
params = img_reg_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_llm.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from argparse import ArgumentParser

from autotrain import logger
from autotrain.cli.utils import get_field_info, llm_munge_data
from autotrain.cli.utils import get_field_info
from autotrain.project import AutoTrainProject
from autotrain.trainers.clm.params import LLMTrainingParams

Expand Down Expand Up @@ -136,7 +136,6 @@ def run(self):
logger.info("Running LLM")
if self.args.train:
params = LLMTrainingParams(**vars(self.args))
params = llm_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_object_detection.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from argparse import ArgumentParser

from autotrain import logger
from autotrain.cli.utils import get_field_info, img_obj_detect_munge_data
from autotrain.cli.utils import get_field_info
from autotrain.project import AutoTrainProject
from autotrain.trainers.object_detection.params import ObjectDetectionParams

Expand Down Expand Up @@ -108,7 +108,6 @@ def run(self):
logger.info("Running Object Detection")
if self.args.train:
params = ObjectDetectionParams(**vars(self.args))
params = img_obj_detect_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_sent_tranformers.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from argparse import ArgumentParser

from autotrain import logger
from autotrain.cli.utils import get_field_info, sent_transformers_munge_data
from autotrain.cli.utils import get_field_info
from autotrain.project import AutoTrainProject
from autotrain.trainers.sent_transformers.params import SentenceTransformersParams

Expand Down Expand Up @@ -108,7 +108,6 @@ def run(self):
logger.info("Running Sentence Transformers...")
if self.args.train:
params = SentenceTransformersParams(**vars(self.args))
params = sent_transformers_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_seq2seq.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from argparse import ArgumentParser

from autotrain import logger
from autotrain.cli.utils import get_field_info, seq2seq_munge_data
from autotrain.cli.utils import get_field_info
from autotrain.project import AutoTrainProject
from autotrain.trainers.seq2seq.params import Seq2SeqParams

Expand Down Expand Up @@ -92,7 +92,6 @@ def run(self):
logger.info("Running Seq2Seq Classification")
if self.args.train:
params = Seq2SeqParams(**vars(self.args))
params = seq2seq_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_tabular.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from argparse import ArgumentParser

from autotrain import logger
from autotrain.cli.utils import get_field_info, tabular_munge_data
from autotrain.cli.utils import get_field_info
from autotrain.project import AutoTrainProject
from autotrain.trainers.tabular.params import TabularParams

Expand Down Expand Up @@ -101,7 +101,6 @@ def run(self):
logger.info("Running Tabular Training")
if self.args.train:
params = TabularParams(**vars(self.args))
params = tabular_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_text_classification.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from argparse import ArgumentParser

from autotrain import logger
from autotrain.cli.utils import get_field_info, text_clf_munge_data
from autotrain.cli.utils import get_field_info
from autotrain.project import AutoTrainProject
from autotrain.trainers.text_classification.params import TextClassificationParams

Expand Down Expand Up @@ -101,7 +101,6 @@ def run(self):
logger.info("Running Text Classification")
if self.args.train:
params = TextClassificationParams(**vars(self.args))
params = text_clf_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
5 changes: 2 additions & 3 deletions src/autotrain/cli/run_text_regression.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from argparse import ArgumentParser

from autotrain import logger
from autotrain.cli.utils import get_field_info, text_reg_munge_data
from autotrain.cli.utils import get_field_info
from autotrain.project import AutoTrainProject
from autotrain.trainers.text_regression.params import TextRegressionParams

Expand Down Expand Up @@ -101,7 +101,6 @@ def run(self):
logger.info("Running Text Regression")
if self.args.train:
params = TextRegressionParams(**vars(self.args))
params = text_reg_munge_data(params, local=self.args.backend.startswith("local"))
project = AutoTrainProject(params=params, backend=self.args.backend)
project = AutoTrainProject(params=params, backend=self.args.backend, process=True)
job_id = project.create()
logger.info(f"Job ID: {job_id}")
Loading
Loading