forked from xuanzic/NeMo
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: HuiyingLi <[email protected]>
- Loading branch information
Showing
5 changed files
with
1,232 additions
and
3,883 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
Llama 3 Supervised Fine-Tuning and Parameter Efficient Fine-Tuning with NeMo 2.0 | ||
================================================================================ | ||
|
||
`Llama 3 <https://blogs.nvidia.com/blog/meta-llama3-inference-acceleration/>`_ is an open-source large language model by Meta that delivers state-of-the-art performance on popular industry benchmarks. It has been pretrained on over 15 trillion tokens, and supports an 8K token context length. It is available in two sizes, 8B and 70B, and each size has two variants—base pretrained and instruction tuned. | ||
|
||
Supervised fine-tuning (SFT) refers to unfreezing all the weights and layers in our model and training on a newly labeled set of examples. We can fine-tune to incorporate new, domain-specific knowledge, or teach the foundation model what type of response to provide. | ||
|
||
`Low-Rank Adaptation (LoRA) <https://arxiv.org/pdf/2106.09685>`__ has emerged as a popular Parameter-Efficient Fine-Tuning (PEFT) technique that tunes a very small number of additional parameters as compared to full fine-tuning, thereby reducing the compute required. | ||
|
||
`NVIDIA NeMo | ||
Framework <https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html>`__ provides tools to perform SFT and LoRA on Llama 3 to fit your use case. | ||
|
||
Requirements | ||
------------ | ||
|
||
* System Configuration | ||
* For SFT: access to at least 2 NVIDIA GPU with a cumulative memory of at least 80GB, for example: 2 x H100-80GB or 2 x A100-80GB. | ||
* For LoRA: access to at least 1 NVIDIA GPU with a cumulative memory of at least 80GB, for example: 1 x H100-80GB or 1 x A100-80GB. | ||
* A Docker-enabled environment, with `NVIDIA Container Runtime <https://developer.nvidia.com/container-runtime>`_ installed, which will make the container GPU-aware. | ||
|
||
* Software Requirements | ||
* Use the latest [NeMo Framework Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags) . Note that you must be logged in to the container registry to view this page. | ||
* This notebook uses the container: `nvcr.io/nvidia/nemo:dev`. | ||
|
||
* NeMo 2.0 and NeMo-Run | ||
* We will use NeMo 2.0 and NeMo-Run perform SFT and LoRA on Llama 3. Both are already available in the NeMo Framework Container. | ||
|
||
|
||
Start the NeMo Framework Container | ||
---------------------------------- | ||
|
||
1. You can start and enter the dev container by: | ||
|
||
.. code:: bash | ||
docker run \ | ||
--gpus device=1 \ | ||
--shm-size=2g \ | ||
--net=host \ | ||
--ulimit memlock=-1 \ | ||
--rm -it \ | ||
-v ${PWD}:/workspace \ | ||
-w /workspace \ | ||
nvcr.io/nvidia/nemo:dev bash | ||
2. You need to request download permission from Meta and Hugging Face. Then from within the container, log in through `huggingface-cli` using your Huggingface token. | ||
|
||
.. code:: bash | ||
huggingface-cli login | ||
3. From within the container, start the Jupyter lab: | ||
|
||
.. code:: bash | ||
jupyter lab --ip 0.0.0.0 --port=8888 --allow-root | ||
4. Then, navigate to `the SFT notebook <./nemo2-sft.ipynb>`__ or `the LoRA notebook <./nemo2-peft.ipynb>`__ to perform SFT or LoRA on Llama 3, respectively. |
Oops, something went wrong.