Skip to content

Commit

Permalink
address comments
Browse files Browse the repository at this point in the history
Signed-off-by: HuiyingLi <[email protected]>
  • Loading branch information
HuiyingLi committed Nov 17, 2024
1 parent 0c6b6ac commit b057641
Show file tree
Hide file tree
Showing 5 changed files with 1,232 additions and 3,883 deletions.
60 changes: 60 additions & 0 deletions tutorials/llm/llama-3/nemo2-sft-peft/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
Llama 3 Supervised Fine-Tuning and Parameter Efficient Fine-Tuning with NeMo 2.0
================================================================================

`Llama 3 <https://blogs.nvidia.com/blog/meta-llama3-inference-acceleration/>`_ is an open-source large language model by Meta that delivers state-of-the-art performance on popular industry benchmarks. It has been pretrained on over 15 trillion tokens, and supports an 8K token context length. It is available in two sizes, 8B and 70B, and each size has two variants—base pretrained and instruction tuned.

Supervised fine-tuning (SFT) refers to unfreezing all the weights and layers in our model and training on a newly labeled set of examples. We can fine-tune to incorporate new, domain-specific knowledge, or teach the foundation model what type of response to provide.

`Low-Rank Adaptation (LoRA) <https://arxiv.org/pdf/2106.09685>`__ has emerged as a popular Parameter-Efficient Fine-Tuning (PEFT) technique that tunes a very small number of additional parameters as compared to full fine-tuning, thereby reducing the compute required.

`NVIDIA NeMo
Framework <https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html>`__ provides tools to perform SFT and LoRA on Llama 3 to fit your use case.

Requirements
------------

* System Configuration
* For SFT: access to at least 2 NVIDIA GPU with a cumulative memory of at least 80GB, for example: 2 x H100-80GB or 2 x A100-80GB.
* For LoRA: access to at least 1 NVIDIA GPU with a cumulative memory of at least 80GB, for example: 1 x H100-80GB or 1 x A100-80GB.
* A Docker-enabled environment, with `NVIDIA Container Runtime <https://developer.nvidia.com/container-runtime>`_ installed, which will make the container GPU-aware.

* Software Requirements
* Use the latest [NeMo Framework Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags) . Note that you must be logged in to the container registry to view this page.
* This notebook uses the container: `nvcr.io/nvidia/nemo:dev`.

* NeMo 2.0 and NeMo-Run
* We will use NeMo 2.0 and NeMo-Run perform SFT and LoRA on Llama 3. Both are already available in the NeMo Framework Container.


Start the NeMo Framework Container
----------------------------------

1. You can start and enter the dev container by:

.. code:: bash
docker run \
--gpus device=1 \
--shm-size=2g \
--net=host \
--ulimit memlock=-1 \
--rm -it \
-v ${PWD}:/workspace \
-w /workspace \
nvcr.io/nvidia/nemo:dev bash
2. You need to request download permission from Meta and Hugging Face. Then from within the container, log in through `huggingface-cli` using your Huggingface token.

.. code:: bash
huggingface-cli login
3. From within the container, start the Jupyter lab:

.. code:: bash
jupyter lab --ip 0.0.0.0 --port=8888 --allow-root
4. Then, navigate to `the SFT notebook <./nemo2-sft.ipynb>`__ or `the LoRA notebook <./nemo2-peft.ipynb>`__ to perform SFT or LoRA on Llama 3, respectively.
Loading

0 comments on commit b057641

Please sign in to comment.