Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs #792

Merged
merged 3 commits into from
Oct 15, 2024
Merged

docs #792

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion configs/image_classification/local.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ data:
valid_split: null
column_mapping:
image_column: image
target_column: labels
target_column: label

params:
epochs: 2
Expand Down
34 changes: 34 additions & 0 deletions configs/llm_finetuning/qwen.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
task: llm-sft
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
project_name: autotrain-qwen-finetune
log: tensorboard
backend: local

data:
path: HuggingFaceH4/no_robots
train_split: test
valid_split: null
chat_template: tokenizer
column_mapping:
text_column: messages

params:
block_size: 2048
model_max_length: 4096
epochs: 1
batch_size: 1
lr: 1e-5
peft: true
quantization: int4
target_modules: all-linear
padding: right
optimizer: adamw_torch
scheduler: cosine
gradient_accumulation: 1
mixed_precision: fp16
merge_adapter: true

hub:
username: ${HF_USERNAME}
token: ${HF_TOKEN}
push_to_hub: true
44 changes: 11 additions & 33 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,12 @@
title: Getting Started
- sections:
- local: quickstart_spaces
title: Quickstart
title: AutoTrain on Hugging Face Spaces
- sections:
title: Train on Spaces
- local: quickstart
title: Quickstart
title: Train Locally
- local: config
title: Configurations
title: Use AutoTrain Locally
- sections:
- local: col_map
title: Understanding Column Mapping
- local: autotrain_api
title: AutoTrain API
title: Miscellaneous
title: Config File
title: Quickstart
- sections:
- local: tasks/llm_finetuning
title: LLM Finetuning
Expand All @@ -33,10 +25,8 @@
title: Extractive QA
- local: tasks/sentence_transformer
title: Sentence Transformer
- local: tasks/image_classification
title: Image Classification
- local: tasks/image_regression
title: Image Scoring/Regression
- local: tasks/image_classification_regression
title: Image Classification / Regression
- local: tasks/object_detection
title: Object Detection
- local: tasks/dreambooth
Expand All @@ -49,20 +39,8 @@
title: Tabular
title: Tasks
- sections:
- local: params/extractive_qa_params
title: Extractive QA
- local: params/image_classification_params
title: Image Classification
- local: params/image_regression_params
title: Image Scoring/Regression
- local: params/object_detection_params
title: Object Detection
- local: params/dreambooth_params
title: DreamBooth
- local: params/seq2seq_params
title: Seq2Seq
- local: params/token_classification_params
title: Token Classification
- local: params/tabular_params
title: Tabular
title: Parameters
- local: col_map
title: Understanding Column Mapping
- local: autotrain_api
title: AutoTrain API
title: Miscellaneous
2 changes: 1 addition & 1 deletion docs/source/index.mdx
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# What is AutoTrain Advanced?
# AutoTrain

![autotrain-homepage](https://raw.githubusercontent.com/huggingface/autotrain-advanced/main/static/autotrain_homepage.png)

Expand Down
3 changes: 0 additions & 3 deletions docs/source/params/dreambooth_params.mdx

This file was deleted.

3 changes: 0 additions & 3 deletions docs/source/params/extractive_qa_params.mdx

This file was deleted.

3 changes: 0 additions & 3 deletions docs/source/params/image_classification_params.mdx

This file was deleted.

5 changes: 0 additions & 5 deletions docs/source/params/image_regression_params.mdx

This file was deleted.

3 changes: 0 additions & 3 deletions docs/source/params/object_detection_params.mdx

This file was deleted.

3 changes: 0 additions & 3 deletions docs/source/params/seq2seq_params.mdx

This file was deleted.

3 changes: 0 additions & 3 deletions docs/source/params/tabular_params.mdx

This file was deleted.

3 changes: 0 additions & 3 deletions docs/source/params/token_classification_params.mdx

This file was deleted.

5 changes: 5 additions & 0 deletions docs/source/tasks/dreambooth.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,8 @@ This token acts as a unique identifier for your subject within the model.
Typically, you will use a simple, descriptive keyword like prompt in the parameters
section of your training setup. This token will be used to generate new images of
your subject by the model.


## Parameters

[[autodoc]] trainers.dreambooth.params.DreamBoothTrainingParams
57 changes: 56 additions & 1 deletion docs/source/tasks/extractive_qa.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,59 @@ Note: the preferred format for question answering is JSONL, if you want to use C
Example dataset from Hugging Face Hub: [lhoestq/squad](https://huggingface.co/datasets/lhoestq/squad)


P.S. You can use both squad and squad v2 data format with correct column mappings.
P.S. You can use both squad and squad v2 data format with correct column mappings.

## Training Locally

To train an Extractive QA model locally, you need a config file:

```yaml
task: extractive-qa
base_model: google-bert/bert-base-uncased
project_name: autotrain-bert-ex-qa1
log: tensorboard
backend: local

data:
path: lhoestq/squad
train_split: train
valid_split: validation
column_mapping:
text_column: context
question_column: question
answer_column: answers

params:
max_seq_length: 512
max_doc_stride: 128
epochs: 3
batch_size: 4
lr: 2e-5
optimizer: adamw_torch
scheduler: linear
gradient_accumulation: 1
mixed_precision: fp16

hub:
username: ${HF_USERNAME}
token: ${HF_TOKEN}
push_to_hub: true
```

To train the model, run the following command:

```bash
$ autotrain --config config.yaml
```

Here, we are training a BERT model on the SQuAD dataset using the Extractive QA task. The model is trained for 3 epochs with a batch size of 4 and a learning rate of 2e-5. The training process is logged using TensorBoard. The model is trained locally and pushed to the Hugging Face Hub after training.

## Training on the Hugging Face Spaces

![AutoTrain Extractive Question Answering on Hugging Face Spaces](https://raw.githubusercontent.com/huggingface/autotrain-advanced/main/static/ext_qa.png)

As always, pay special attention to column mapping.

## Parameters

[[autodoc]] trainers.extractive_question_answering.params.ExtractiveQuestionAnsweringParams
63 changes: 0 additions & 63 deletions docs/source/tasks/image_classification.mdx

This file was deleted.

Loading
Loading