From eb0eafe481e9ec6effcb42a8584f3a7f4e152d9d Mon Sep 17 00:00:00 2001 From: HuiyingLi Date: Fri, 15 Nov 2024 11:26:59 -0800 Subject: [PATCH] update sft peft notebook with batch generate Signed-off-by: HuiyingLi --- tutorials/llm/nemo2-peft.ipynb | 1937 ++++++++++++++++++++++++++++---- tutorials/llm/nemo2-sft.ipynb | 1580 +++++++++++++++++++++++--- 2 files changed, 3140 insertions(+), 377 deletions(-) diff --git a/tutorials/llm/nemo2-peft.ipynb b/tutorials/llm/nemo2-peft.ipynb index 84026ee3b55e..c98d7a12c100 100644 --- a/tutorials/llm/nemo2-peft.ipynb +++ b/tutorials/llm/nemo2-peft.ipynb @@ -12,9 +12,9 @@ "\n", "This optimization process is known as fine-tuning, which involves adjusting the weights of a pre-trained foundation model with custom data.\n", "\n", - "Considering that foundation models can be significantly large, a variant of fine-tuning has gained traction recently known as PEFT. PEFT encompasses several methods, including P-Tuning, LoRA, Adapters, IA3, etc. NeMo 2.0 currently supports Low-Rank Adaptation(LoRA) method.\n", + "Considering that foundation models can be significantly large, a variant of fine-tuning has gained traction recently known as PEFT. PEFT encompasses several methods, including P-Tuning, LoRA, Adapters, IA3, etc. NeMo 2.0 currently supports Low-Rank Adaptation (LoRA) method.\n", "\n", - "This playbook involves applying LoRA to the Llama3 using NeMo 2.0. \n", + "This playbook involves applying LoRA to Llama3 using NeMo 2.0. \n", "\n", "## NeMo 2.0\n", "\n", @@ -24,15 +24,30 @@ "\n", "- Modular Abstractions - By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 simplifies adaptation and experimentation. This modular approach allows developers to more easily modify and experiment with different components of their models.\n", "\n", - "- Scalability - NeMo 2.0 seamlessly scaling large-scale experiments across thousands of GPUs using NeMo-Run, a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across computing environments.\n", + "- Scalability - NeMo 2.0 seamlessly scales large-scale experiments across thousands of GPUs using NeMo-Run, a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across computing environments.\n", "\n", "By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 makes it easy for users to adapt the framework to their specific use cases and experiment with various configurations. This section offers an overview of the new features in NeMo 2.0 and includes a migration guide with step-by-step instructions for transitioning your models from NeMo 1.0 to NeMo 2.0.\n", "\n", + "## NeMo-Run\n", + "\n", + "NeMo-Run is a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across various computing environments. To run your experiments with Nemo-Run, you need to follow the three steps:\n", + "1. Configure your function\n", + "\n", + "2. Define your Executor\n", + "\n", + "3. Run your experiment\n", + "\n", + "We will be using NeMo-Run to run our experiments in this notebook.\n", + "\n", "\n", "# NeMo Tools and Resources\n", "1. [NeMo Github repo](https://github.com/NVIDIA/NeMo)\n", "\n", - "2. NeMo Framework Training container: `nvcr.io/nvidia/nemo:dev` #TODO: FIX CONTAINER\n", + "2. [NeMo-Run Github repo](https://github.com/NVIDIA/NeMo-Run/)\n", + "\n", + "3. NeMo Framework Training container: `nvcr.io/nvidia/nemo:dev`\n", + "\n", + "\n", "\n", "# Educational Resources\n", "1. Blog: [Mastering LLM Techniques: Customization](https://developer.nvidia.com/blog/selecting-large-language-model-customization-techniques/)\n", @@ -48,7 +63,7 @@ "\n", "1. Use the latest [NeMo Framework Training container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags) . Note that you must be logged in to the container registry to view this page.\n", "\n", - "2. This notebook uses the container: `nvcr.io/nvidia/nemo:dev` #TODO: FIX CONTAINER \n", + "2. This notebook uses the container: `nvcr.io/nvidia/nemo:dev`.\n", "\n", "\n", "## Hardware Requirements\n", @@ -66,7 +81,7 @@ "source": [ "# Step 0: Go inside docker container\n", "\n", - "You can start and enter the dev container by: #TODO: FIX CONTAINER\n", + "You can start and enter the dev container by:\n", "```\n", "docker run --gpus device=1 --shm-size=2g --net=host --ulimit memlock=-1 --rm -it -v ${PWD}:/workspace -w /workspace -v ${PWD}/results:/results nvcr.io/nvidia/nemo:dev bash\n", "\n", @@ -79,36 +94,311 @@ "source": [ "\n", "# Step 1: Import HuggingFace checkpoint\n", - "First request download permission from Meta and Hugging Face. Login through `huggingface-cli` using your Huggingface token before importing llama3 models. \n", + "First request download permission from Meta and Hugging Face. Log in through `huggingface-cli` using your Hugging Face token before importing llama3 models. \n", "\n", "```\n", "$ huggingface-cli login\n", "```\n", "\n", - "Once logged in, you can use the following script to import a Hugging Face model. Based on the provided model configuration (`Llama3-8b` in the example below), the `llm.import_ckpt` API will download the specified model using the \"hf://\" URL format. It will then convert the model into NeMo 2.0 format and store it at the given `output_path`.\n" + "Once logged in, you can use the following script to import a Hugging Face model. Based on the provided model configuration (`Llama3-8b` in the example below), the `llm.import_ckpt` API will download the specified model using the \"hf://\" URL format. It will then convert the model into NeMo 2.0 format. \n", + "\n" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "execution_count": 1, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/usr/local/lib/python3.10/dist-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n", + "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n", + " warnings.warn(\n", + "[NeMo W 2024-11-15 09:43:51 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n", + " cm = get_cmap(\"Set1\")\n", + " \n" + ] + }, + { + "data": { + "text/html": [ + "
Entering Experiment nemo.collections.llm.api.import_ckpt with id: nemo.collections.llm.api.import_ckpt_1731692…\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m─ \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.import_ckpt with id: nemo.collections.llm.api.import_ckpt_1731692…\u001b[0m\u001b[92m ─\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731692632/nemo.collections.llm.api.import_ckpt\n" + ] + }, + { + "data": { + "text/html": [ + "
[09:43:52] Launching job nemo.collections.llm.api.import_ckpt for experiment                      experiment.py:660\n",
+       "           nemo.collections.llm.api.import_ckpt                                                                    \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[2;36m[09:43:52]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching job nemo.collections.llm.api.import_ckpt for experiment \u001b[0m \u001b]8;id=6439;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=636758;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\u001b\\\u001b[2m660\u001b[0m\u001b]8;;\u001b\\\n", + "\u001b[2;36m \u001b[0m\u001b[1;36mnemo.collections.llm.api.import_ckpt\u001b[0m \u001b[2m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731692632/nemo.collections.llm.api.import_ckpt\n", + "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30\n", + "AppStatus:\n", + " State: RUNNING\n", + " Num Restarts: 0\n", + " Roles: \n", + " Msg: \n", + " Structured Error Msg: \n", + " UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731692632/nemo.collections.llm.api.import_ckpt/nemo_run/nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30\n", + " \n" + ] + }, + { + "data": { + "text/html": [ + "
──────────────── Waiting for Experiment nemo.collections.llm.api.import_ckpt_1731692632 to finish ─────────────────\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m──────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.import_ckpt_1731692632 to finish\u001b[0m\u001b[92m ─────────────────\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
Experiment Status for nemo.collections.llm.api.import_ckpt_1731692632\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.import_ckpt_1731692632\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "Task 0: nemo.collections.llm.api.import_ckpt\n",
+       "- Status: RUNNING\n",
+       "- Executor: LocalExecutor\n",
+       "- Job id: nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30\n",
+       "- Local Directory: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731692632/nemo.collections.llm.api.import_ckpt\n",
+       "
\n" + ], + "text/plain": [ + "\n", + "\u001b[1;32mTask 0\u001b[0m: \u001b[1;38;5;214mnemo.collections.llm.api.import_ckpt\u001b[0m\n", + "- \u001b[1;32mStatus\u001b[0m: RUNNING\n", + "- \u001b[1;32mExecutor\u001b[0m: LocalExecutor\n", + "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30\n", + "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731692632/nemo.collections.llm.api.import_ckpt\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Waiting for job nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30 to finish [log=True]...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "mport_ckpt/0 [NeMo W 2024-11-15 09:43:59 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n", + "mport_ckpt/0 cm = get_cmap(\"Set1\")\n", + "mport_ckpt/0 \n", + "mport_ckpt/0 Downloading shards: 100%|██████████| 4/4 [00:00<00:00, 4830.76it/s]\n", + "mport_ckpt/0 Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00, 3.13it/s]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_strategy:310] Fixing mis-match between ddp-config & mcore-optimizer config\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:396] Rank 0 has data parallel group : [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:402] Rank 0 has combined group of data parallel and context parallel : [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:407] All data parallel group ranks with context parallel combined: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:410] Ranks 0 has data parallel rank: 0\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:418] Rank 0 has context parallel group: [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:421] All context parallel group ranks: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:422] Ranks 0 has context parallel rank: 0\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:429] Rank 0 has model parallel group: [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:430] All model parallel group ranks: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:439] Rank 0 has tensor model parallel group: [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:443] All tensor model parallel group ranks: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:444] Rank 0 has tensor model parallel rank: 0\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:464] Rank 0 has pipeline model parallel group: [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:476] Rank 0 has embedding group: [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:482] All pipeline model parallel group ranks: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:483] Rank 0 has pipeline model parallel rank 0\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:484] All embedding group ranks: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 megatron_init:485] Rank 0 has embedding rank: 0\n", + "mport_ckpt/0 GPU available: True (cuda), used: False\n", + "mport_ckpt/0 TPU available: False, using: 0 TPU cores\n", + "mport_ckpt/0 HPU available: False, using: 0 HPUs\n", + "mport_ckpt/0 [NeMo W 2024-11-15 09:44:02 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.\n", + "mport_ckpt/0 \n", + "mport_ckpt/0 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1\n", + "mport_ckpt/0 ----------------------------------------------------------------------------------------------------\n", + "mport_ckpt/0 distributed_backend=gloo\n", + "mport_ckpt/0 All distributed processes registered. Starting with 1 processes\n", + "mport_ckpt/0 ----------------------------------------------------------------------------------------------------\n", + "mport_ckpt/0 \n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:44:02 base:44] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.\n", + "mport_ckpt/0 [NeMo W 2024-11-15 09:44:02 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py:1090: `trainer.init_module` cannot fully support proper instantiation of your model with the `MegatronStrategy` strategy. Please instantiate your model inside the`LightningModule.configure_model` hook instead\n", + "mport_ckpt/0 \n", + "mport_ckpt/0 [NeMo W 2024-11-15 09:44:40 megatron_strategy:324] Could not copy Trainer's 'max_steps' to LR scheduler's 'max_steps'. If you are not using an LR scheduler, this warning can safely be ignored.\n", + "mport_ckpt/0 huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "mport_ckpt/0 To disable this warning, you can either:\n", + "mport_ckpt/0 \t- Avoid using `tokenizers` before the fork if possible\n", + "mport_ckpt/0 \t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "mport_ckpt/0 Converted Llama model to Nemo, model saved to /root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B in torch.bfloat16.\n", + "mport_ckpt/0 \u001b[32m $\u001b[0m\u001b[32mNEMO_MODELS_CACHE\u001b[0m\u001b[32m=\u001b[0m\u001b[32m/root/.cache/nemo/\u001b[0m\u001b[32mmodels\u001b[0m\u001b[32m \u001b[0m\n", + "mport_ckpt/0 \u001b[32m✓ Checkpoint imported to \u001b[0m\u001b[32m/root/.cache/nemo/models/meta-llama/\u001b[0m\u001b[32mMeta-Llama-3-8B\u001b[0m\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Job nemo.collections.llm.api.import_ckpt-jjdv0bm9tlj30 finished: SUCCEEDED\n" + ] + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# The experiment was run with the following tasks: ['nemo.collections.llm.api.import_ckpt']                        \n",
+       "# You can inspect and reconstruct this experiment at a later point in time using:                                  \n",
+       "experiment = run.Experiment.from_id(\"nemo.collections.llm.api.import_ckpt_1731692632\")                             \n",
+       "experiment.status() # Gets the overall status                                                                      \n",
+       "experiment.logs(\"nemo.collections.llm.api.import_ckpt\") # Gets the log for the provided task                       \n",
+       "experiment.cancel(\"nemo.collections.llm.api.import_ckpt\") # Cancels the provided task if still running             \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# The experiment was run with the following tasks: ['nemo.collections.llm.api.import_ckpt']\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect and reconstruct this experiment at a later point in time using:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731692632\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the overall status\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.import_ckpt\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the log for the provided task\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.import_ckpt\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Cancels the provided task if still running\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# You can inspect this experiment at a later point in time using the CLI as well:                                  \n",
+       "nemo experiment status nemo.collections.llm.api.import_ckpt_1731692632                                             \n",
+       "nemo experiment logs nemo.collections.llm.api.import_ckpt_1731692632 0                                             \n",
+       "nemo experiment cancel nemo.collections.llm.api.import_ckpt_1731692632 0                                           \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect this experiment at a later point in time using the CLI as well:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731692632\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731692632\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731692632\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ + "import nemo_run as run\n", "from nemo import lightning as nl\n", "from nemo.collections import llm\n", "from megatron.core.optimizer import OptimizerConfig\n", + "from nemo.collections.llm.peft.lora import LoRA\n", "import torch\n", "import pytorch_lightning as pl\n", "from pathlib import Path\n", + "from nemo.collections.llm.recipes.precision.mixed_precision import bf16_mixed\n", + "\n", + "\n", + "# llm.import_ckpt is the nemo2 API for converting Hugging Face checkpoint to NeMo format\n", + "# example usage:\n", + "# llm.import_ckpt(model=llm.llama3_8b.model(), source=\"hf://meta-llama/Meta-Llama-3-8B\")\n", + "#\n", + "# We use run.Partial to configure this function\n", + "def configure_checkpoint_conversion():\n", + " return run.Partial(\n", + " llm.import_ckpt,\n", + " model=llm.llama3_8b.model(),\n", + " source=\"hf://meta-llama/Meta-Llama-3-8B\",\n", + " overwrite=False,\n", + " )\n", "\n", - "def llama3_8b() -> pl.LightningModule:\n", - " from transformers import AutoTokenizer\n", - " tokenizer = AutoTokenizer.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\n", - " return llm.LlamaModel(llm.Llama3Config8B(), tokenizer=tokenizer)\n", + "# configure your function\n", + "import_ckpt = configure_checkpoint_conversion()\n", + "# define your executor\n", + "local_executor = run.LocalExecutor()\n", "\n", - "if __name__ == '__main__':\n", - " output_path=\"llama3-8b-nemo2\"\n", - " llm.import_ckpt(model=llama3_8b(), source=\"hf://meta-llama/Meta-Llama-3-8B\",output_path=Path(output_path))\n" + "# run your experiment\n", + "run.run(import_ckpt, executor=local_executor)\n" ] }, { @@ -122,27 +412,28 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 2, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "\n", - "def squad() -> pl.LightningDataModule:\n", - " return llm.SquadDataModule(seq_length=2048, micro_batch_size=2, global_batch_size=8, num_workers=0)" + "def squad() -> run.Config[pl.LightningDataModule]:\n", + " return run.Config(llm.SquadDataModule, seq_length=2048, micro_batch_size=1, global_batch_size=8, num_workers=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "For how to use your own data to create your custom `DataModule` in order to perform PEFT, refer to [NeMo 2.0 SFT notebook](https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/nemo2-sft.ipynb). ##TODO: Verify this link works before publish" + "For how to use your own data to create your custom `DataModule` in order to perform PEFT, refer to [NeMo 2.0 SFT notebook](https://github.com/NVIDIA/NeMo/tree/main/tutorials/llm/nemo2-sft.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Step 3: Run PEFT with NeMo 2.0 API \n", + "## Step 3: Run PEFT with NeMo 2.0 API and NeMo-Run\n", "\n", "The following python script utilizes NeMo 2.0 API to perform PEFT. In this script we are configuring the following components for training. These components are similar between SFT and PEFT. SFT and PEFT both uses `llm.finetune` API. To switch from SFT to PEFT you just need to add `peft` with LoRA adapter to the API parameter.\n", "\n", @@ -153,27 +444,30 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 3, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "\n", - "def trainer(devices=1) -> nl.Trainer:\n", - " strategy = nl.MegatronStrategy(\n", - " tensor_model_parallel_size=1,\n", + "def trainer() -> run.Config[nl.Trainer]:\n", + " strategy = run.Config(\n", + " nl.MegatronStrategy,\n", + " tensor_model_parallel_size=1\n", " )\n", - "\n", - " return nl.Trainer(\n", + " trainer = run.Config(\n", + " nl.Trainer,\n", " devices=1,\n", - " max_steps=40,\n", + " max_steps=20,\n", " accelerator=\"gpu\",\n", " strategy=strategy,\n", - " plugins=nl.MegatronMixedPrecision(precision=\"bf16-mixed\"),\n", + " plugins=bf16_mixed(),\n", " log_every_n_steps=1,\n", " limit_val_batches=2,\n", " val_check_interval=2,\n", " num_sanity_val_steps=0,\n", - " )\n" + " )\n", + " return trainer\n" ] }, { @@ -187,12 +481,15 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 4, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "def logger() -> nl.NeMoLogger:\n", - " ckpt = nl.ModelCheckpoint(\n", + "def logger() -> run.Config[nl.NeMoLogger]:\n", + " ckpt = run.Config(\n", + " nl.ModelCheckpoint,\n", " save_last=True,\n", " every_n_train_steps=10,\n", " monitor=\"reduced_train_loss\",\n", @@ -201,7 +498,8 @@ " save_optim_on_train_end=True,\n", " )\n", "\n", - " return nl.NeMoLogger(\n", + " return run.Config(\n", + " nl.NeMoLogger,\n", " name=\"nemo2_peft\",\n", " log_dir=\"./results\",\n", " use_datetime_version=False,\n", @@ -224,21 +522,26 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 5, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "def adam_with_cosine_annealing() -> nl.OptimizerModule:\n", - " return nl.MegatronOptimizerModule(\n", - " config=OptimizerConfig(\n", - " optimizer=\"adam\",\n", - " lr=0.0001,\n", - " adam_beta2=0.98,\n", - " use_distributed_optimizer=True,\n", - " clip_grad=1.0,\n", - " bf16=True,\n", - " ),\n", - " )" + "def adam_with_cosine_annealing() -> run.Config[nl.OptimizerModule]:\n", + " opt_cfg = run.Config(\n", + " OptimizerConfig,\n", + " optimizer=\"adam\",\n", + " lr=0.0001,\n", + " adam_beta2=0.98,\n", + " use_distributed_optimizer=True,\n", + " clip_grad=1.0,\n", + " bf16=True,\n", + " )\n", + " return run.Config(\n", + " nl.MegatronOptimizerModule,\n", + " config=opt_cfg\n", + " )\n" ] }, { @@ -251,15 +554,14 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 6, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "def lora() -> nl.pytorch.callbacks.PEFT:\n", - " return llm.peft.LoRA(\n", - " target_modules=['linear_qkv', 'linear_proj'], # full list:['linear_qkv', 'linear_proj', 'linear_fc1', 'linear_fc2']\n", - " dim=32,\n", - " )" + "def lora() -> run.Config[nl.pytorch.callbacks.PEFT]:\n", + " return run.Config(LoRA)" ] }, { @@ -272,14 +574,14 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 7, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "def llama3_8b() -> pl.LightningModule:\n", - " from transformers import AutoTokenizer\n", - " tokenizer = AutoTokenizer.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\n", - " return llm.LlamaModel(llm.Llama3Config8B(), tokenizer=tokenizer)" + "def llama3_8b() -> run.Config[pl.LightningModule]:\n", + " return run.Config(llm.LlamaModel, config=run.Config(llm.Llama3Config8B))" ] }, { @@ -292,15 +594,17 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 8, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "\n", - "def resume() -> nl.AutoResume:\n", - " return nl.AutoResume(\n", - " restore_config=nl.RestoreConfig(\n", - " path=\"hf://meta-llama/Meta-Llama-3-8B\"\n", + "def resume() -> run.Config[nl.AutoResume]:\n", + " return run.Config(\n", + " nl.AutoResume,\n", + " restore_config=run.Config(nl.RestoreConfig,\n", + " path=\"nemo://meta-llama/Meta-Llama-3-8B\"\n", " ),\n", " resume_if_exists=True,\n", " )" @@ -329,155 +633,1404 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "execution_count": 9, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
─── Entering Experiment nemo.collections.llm.api.finetune with id: nemo.collections.llm.api.finetune_1731692700 ───\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m─── \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.finetune with id: nemo.collections.llm.api.finetune_1731692700\u001b[0m\u001b[92m ───\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune\n" + ] + }, + { + "data": { + "text/html": [ + "
[09:45:00] Launching job nemo.collections.llm.api.finetune for experiment                         experiment.py:660\n",
+       "           nemo.collections.llm.api.finetune                                                                       \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[2;36m[09:45:00]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching job nemo.collections.llm.api.finetune for experiment \u001b[0m \u001b]8;id=93593;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=6694;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\u001b\\\u001b[2m660\u001b[0m\u001b]8;;\u001b\\\n", + "\u001b[2;36m \u001b[0m\u001b[1;36mnemo.collections.llm.api.finetune\u001b[0m \u001b[2m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune\n", + "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.finetune-wdj265kcplhnkd\n", + "AppStatus:\n", + " State: RUNNING\n", + " Num Restarts: 0\n", + " Roles: \n", + " Msg: \n", + " Structured Error Msg: \n", + " UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune/nemo_run/nemo.collections.llm.api.finetune-wdj265kcplhnkd\n", + " \n" + ] + }, + { + "data": { + "text/html": [ + "
────────────────── Waiting for Experiment nemo.collections.llm.api.finetune_1731692700 to finish ──────────────────\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m────────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.finetune_1731692700 to finish\u001b[0m\u001b[92m ──────────────────\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
Experiment Status for nemo.collections.llm.api.finetune_1731692700\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.finetune_1731692700\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "Task 0: nemo.collections.llm.api.finetune\n",
+       "- Status: RUNNING\n",
+       "- Executor: LocalExecutor\n",
+       "- Job id: nemo.collections.llm.api.finetune-wdj265kcplhnkd\n",
+       "- Local Directory: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune\n",
+       "
\n" + ], + "text/plain": [ + "\n", + "\u001b[1;32mTask 0\u001b[0m: \u001b[1;38;5;214mnemo.collections.llm.api.finetune\u001b[0m\n", + "- \u001b[1;32mStatus\u001b[0m: RUNNING\n", + "- \u001b[1;32mExecutor\u001b[0m: LocalExecutor\n", + "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.finetune-wdj265kcplhnkd\n", + "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Waiting for job nemo.collections.llm.api.finetune-wdj265kcplhnkd to finish [log=True]...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] Starting elastic_operator with launch configs:\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] entrypoint : nemo_run.core.runners.fdl_runner\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] min_nodes : 1\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] max_nodes : 1\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] nproc_per_node : 1\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] run_id : 658\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_backend : c10d\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_endpoint : localhost:0\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_configs : {'timeout': 900}\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] max_restarts : 0\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] monitor_interval : 0.1\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] log_dir : /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731692700/nemo.collections.llm.api.finetune/nemo_run/nemo.collections.llm.api.finetune-wdj265kcplhnkd/torchelastic/nemo.collections.llm.api.finetune\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] metrics_cfg : {}\n", + "i.finetune/0 I1115 09:45:01.671000 140737350272832 torch/distributed/launcher/api.py:188] \n", + "i.finetune/0 I1115 09:45:01.673000 140737350272832 torch/distributed/elastic/agent/server/api.py:825] [default] starting workers for entrypoint: python\n", + "i.finetune/0 I1115 09:45:01.673000 140737350272832 torch/distributed/elastic/agent/server/api.py:646] [default] Rendezvous'ing worker group\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] [default] Rendezvous complete for workers. Result:\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] restart_count=0\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] master_addr=eos0346.eos.clusters.nvidia.com\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] master_port=50753\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] group_rank=0\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] group_world_size=1\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] local_ranks=[0]\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] role_ranks=[0]\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] global_ranks=[0]\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] role_world_sizes=[1]\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] global_world_sizes=[1]\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] \n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/api.py:654] [default] Starting worker group\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:184] Environment variable 'TORCHELASTIC_ENABLE_FILE_TIMER' not found. Do not start FileTimerServer.\n", + "i.finetune/0 I1115 09:45:01.732000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:216] Environment variable 'TORCHELASTIC_HEALTH_CHECK_PORT' not found. Do not start health check.\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:08 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n", + "i.finetune/0 [default0]: cm = get_cmap(\"Set1\")\n", + "i.finetune/0 [default0]: \n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:09 api:734] Disabling try_restore_best_ckpt restoration for adapters\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:09 nemo_logger:145] Experiments will be logged at results/nemo2_peft\n", + "i.finetune/0 [default0]:GPU available: True (cuda), used: True\n", + "i.finetune/0 [default0]:TPU available: False, using: 0 TPU cores\n", + "i.finetune/0 [default0]:HPU available: False, using: 0 HPUs\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:09 nemo_logger:123] No version folders would be created under the log folder as 'resume_if_exists' is enabled.\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:09 nemo_logger:173] \"update_logger_directory\" is True. Overwriting tensorboard logger \"save_dir\" to results\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:09 nemo_logger:189] The Trainer already contains a ModelCheckpoint callback. This will be overwritten.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:09 megatron_strategy:310] Fixing mis-match between ddp-config & mcore-optimizer config\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:396] Rank 0 has data parallel group : [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:402] Rank 0 has combined group of data parallel and context parallel : [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:407] All data parallel group ranks with context parallel combined: [[0]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:410] Ranks 0 has data parallel rank: 0\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:418] Rank 0 has context parallel group: [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:421] All context parallel group ranks: [[0]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:422] Ranks 0 has context parallel rank: 0\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:429] Rank 0 has model parallel group: [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:430] All model parallel group ranks: [[0]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:439] Rank 0 has tensor model parallel group: [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:443] All tensor model parallel group ranks: [[0]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:444] Rank 0 has tensor model parallel rank: 0\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:464] Rank 0 has pipeline model parallel group: [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:476] Rank 0 has embedding group: [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:482] All pipeline model parallel group ranks: [[0]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:483] Rank 0 has pipeline model parallel rank 0\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:484] All embedding group ranks: [[0]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 megatron_init:485] Rank 0 has embedding rank: 0\n", + "i.finetune/0 [default0]:Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1\n", + "i.finetune/0 [default0]:----------------------------------------------------------------------------------------------------\n", + "i.finetune/0 [default0]:distributed_backend=nccl\n", + "i.finetune/0 [default0]:All distributed processes registered. Starting with 1 processes\n", + "i.finetune/0 [default0]:----------------------------------------------------------------------------------------------------\n", + "i.finetune/0 [default0]:\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:10 squad:87] Downloading SquadDataModule...\n", + "i.finetune/0 [default0]:\n", + "i.finetune/0 [default0]:Generating train split: 0%| | 0/87599 [00:00.wrapper at 0x7ff9ec5b0a60>\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:20 base:44] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:20 num_microbatches_calculator:228] setting number of microbatches to constant 8\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:20 megatron_strategy:745] Doing selective restore from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n", + "i.finetune/0 [default0]:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:20 megatron_strategy:324] Could not copy Trainer's 'max_steps' to LR scheduler's 'max_steps'. If you are not using an LR scheduler, this warning can safely be ignored.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 megatron_strategy:750] Restoring model weights from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 megatron_strategy:757] Finished restoring from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True), cleaning up.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:116] Building data files\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:528] Processing 1 data files using 1 workers\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:494] Building indexing for fn = /root/.cache/nemo/datasets/squad/training.jsonl\n", + "i.finetune/0 [default0]:\n", + "i.finetune/0 [default0]: | Name | Type | Params | Mode \n", + "i.finetune/0 [default0]:--------------------------------------------\n", + "i.finetune/0 [default0]:0 | module | GPTModel | 8.0 B | train\n", + "i.finetune/0 [default0]:--------------------------------------------\n", + "i.finetune/0 [default0]:8.0 B Trainable params\n", + "i.finetune/0 [default0]:0 Non-trainable params\n", + "i.finetune/0 [default0]:8.0 B Total params\n", + "i.finetune/0 [default0]:32,121.045Total estimated model params size (MB)\n", + "i.finetune/0 [default0]:649 Modules in train mode\n", + "i.finetune/0 [default0]:0 Modules in eval mode\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:506] Saving idx file = /root/.cache/nemo/datasets/squad/training.jsonl.idx.npy\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:508] Saving metadata file = /root/.cache/nemo/datasets/squad/training.jsonl.idx.info\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:543] Time building 1 / 1 mem-mapped files: 0:00:00.133476\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:528] Processing 1 data files using 1 workers\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[rank: 0] Received SIGTERM: 15\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:543] Time building 0 / 1 mem-mapped files: 0:00:00.080904\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:158] Loading data files\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:249] Loading /root/.cache/nemo/datasets/squad/training.jsonl\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000532\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:165] Computing global indices\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:116] Building data files\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:528] Processing 1 data files using 1 workers\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:34 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=223` in the `DataLoader` to improve performance.\n", + "i.finetune/0 [default0]: \n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:494] Building indexing for fn = /root/.cache/nemo/datasets/squad/validation.jsonl\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:506] Saving idx file = /root/.cache/nemo/datasets/squad/validation.jsonl.idx.npy\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:34 text_memmap_dataset:508] Saving metadata file = /root/.cache/nemo/datasets/squad/validation.jsonl.idx.info\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:543] Time building 1 / 1 mem-mapped files: 0:00:00.088056\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:528] Processing 1 data files using 1 workers\n", + "i.finetune/0 [default0]:[rank: 0] Received SIGTERM: 15\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[rank: 0] Received SIGTERM: 15\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:543] Time building 0 / 1 mem-mapped files: 0:00:00.082867\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:158] Loading data files\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:249] Loading /root/.cache/nemo/datasets/squad/validation.jsonl\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000437\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 text_memmap_dataset:165] Computing global indices\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.0.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.0.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.0.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.0.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.1.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.1.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.1.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.1.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.2.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.2.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.2.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.2.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.3.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.3.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.3.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.3.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.4.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.4.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.4.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.4.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.5.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.5.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.5.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.5.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:35 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=223` in the `DataLoader` to improve performance.\n", + "i.finetune/0 [default0]: \n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.6.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.6.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.6.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.6.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.7.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.7.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.7.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.7.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.8.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.8.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.8.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.8.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.9.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.9.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.9.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.9.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.10.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.10.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.10.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.10.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.11.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.11.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.11.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.11.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.12.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.12.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.12.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.12.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.13.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.13.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.13.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.13.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.14.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.14.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.14.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.14.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.15.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.15.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.15.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.15.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.16.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.16.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.16.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.16.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.17.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.17.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.17.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.17.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.18.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.18.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.18.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.18.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.19.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.19.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.19.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.19.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.20.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.20.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.20.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.20.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.21.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.21.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.21.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.21.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.22.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.22.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.22.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.22.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.23.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.23.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.23.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.23.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.24.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.24.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.24.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.24.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.25.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.25.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.25.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.25.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.26.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.26.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.26.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.26.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.27.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.27.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.27.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.27.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.28.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.28.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.28.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.28.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.29.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.29.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.29.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.29.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.30.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.30.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.30.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.30.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.31.self_attention.linear_proj\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.31.self_attention.linear_qkv\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.31.mlp.linear_fc1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 lora:227] Adding lora to: module.decoder.layers.31.mlp.linear_fc2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 model_transform:90] After applying model_transform:\n", + "i.finetune/0 [default0]: | Name | Type | Params | Mode \n", + "i.finetune/0 [default0]: --------------------------------------------\n", + "i.finetune/0 [default0]: 0 | module | GPTModel | 8.1 B | train\n", + "i.finetune/0 [default0]: --------------------------------------------\n", + "i.finetune/0 [default0]: 71.3 M Trainable params\n", + "i.finetune/0 [default0]: 8.0 B Non-trainable params\n", + "i.finetune/0 [default0]: 8.1 B Total params\n", + "i.finetune/0 [default0]: 32,406.258Total estimated model params size (MB)\n", + "i.finetune/0 [default0]: 1289 Modules in train mode\n", + "i.finetune/0 [default0]: 0 Modules in eval mode\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 peft:177] Initializing model parallel\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 megatron_parallel:550] > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 8101564416\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 megatron_parallel:553] > number of trainable parameters: 71303168 (0.88% of total)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 utils:278] Setting up DistributedDataParallel with config DistributedDataParallelConfig(grad_reduce_in_fp32=True, overlap_grad_reduce=False, overlap_param_gather=False, align_param_gather=False, use_distributed_optimizer=True, check_for_nan_in_grad=True, bucket_size=None, average_in_collective=False, fp8_param_gather=False)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 utils:299] Number of buckets for gradient all-reduce / reduce-scatter: 1\n", + "i.finetune/0 [default0]: Params for bucket 1 (71303168 elements):\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.mlp.linear_fc1.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.self_attention.linear_proj.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.mlp.linear_fc2.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.self_attention.linear_proj.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.mlp.linear_fc1.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.self_attention.linear_qkv.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.mlp.linear_fc2.adapter.linear_out.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.self_attention.linear_qkv.adapter.linear_in.weight\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 peft:181] Setting up optimizers\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:45:35 utils:278] Setting up optimizer with config OptimizerConfig(optimizer='adam', lr=0.0001, min_lr=None, decoupled_lr=None, decoupled_min_lr=None, weight_decay=0.01, fp16=False, bf16=True, params_dtype=torch.bfloat16, loss_scale=None, initial_loss_scale=4294967296, min_loss_scale=1.0, loss_scale_window=1000, hysteresis=2, adam_beta1=0.9, adam_beta2=0.98, adam_eps=1e-08, sgd_momentum=0.9, use_distributed_optimizer=True, overlap_param_gather_with_optimizer_step=False, clip_grad=1.0, log_num_zeros_in_grad=False, barrier_with_L1_time=False, timers=None, config_logger_dir='')\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 0/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 0 | reduced_train_loss: 1.956\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 1/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 1 | reduced_train_loss: 1.509 | consumed_samples: 16\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:48 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:431: It is recommended to use `self.log('global_batch_size', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.\n", + "i.finetune/0 [default0]: \n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:45:48 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:431: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.\n", + "i.finetune/0 [default0]: \n", + "i.finetune/0 [default0]:Training epoch 0, iteration 2/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 2 | reduced_train_loss: 0.3079 | consumed_samples: 24 | val_loss: 0.3142\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 3/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 3 | reduced_train_loss: 0.4225 | consumed_samples: 32 | val_loss: 0.3142\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 4/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 4 | reduced_train_loss: 0.2569 | consumed_samples: 40 | val_loss: 0.1524\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 5/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 5 | reduced_train_loss: 0.4586 | consumed_samples: 48 | val_loss: 0.1524\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 6/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 6 | reduced_train_loss: 0.4207 | consumed_samples: 56 | val_loss: 0.1952\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 7/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 7 | reduced_train_loss: 0.081 | consumed_samples: 64 | val_loss: 0.1952\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 8/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 8 | reduced_train_loss: 0.2103 | consumed_samples: 72 | val_loss: 0.1372\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 9/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 9 | reduced_train_loss: 0.3401 | consumed_samples: 80 | val_loss: 0.1372\n", + "i.finetune/0 [default0]:Epoch 0, global step 9: 'reduced_train_loss' reached 0.34012 (best 0.34012), saving model to 'results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3401-epoch=0.ckpt' as top 1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:00 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3401-epoch=0.ckpt\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:01 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3401-epoch=0-last.ckpt\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 10/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 10 | reduced_train_loss: 0.2867 | consumed_samples: 88 | val_loss: 0.1337\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:04 model_checkpoint:522] Async checkpoint save for step 10 (results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3401-epoch=0.ckpt) finalized successfully.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:04 model_checkpoint:522] Async checkpoint save for step 10 (results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.3401-epoch=0-last.ckpt) finalized successfully.\n", + "i.finetune/0 [default0]:[rank0]:W1115 09:46:04.545000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] torch._dynamo hit config.cache_size_limit (8)\n", + "i.finetune/0 [default0]:[rank0]:W1115 09:46:04.545000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] function: 'calculate_cross_entropy_loss' (/opt/megatron-lm/megatron/core/fusions/fused_cross_entropy.py:47)\n", + "i.finetune/0 [default0]:[rank0]:W1115 09:46:04.545000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] last reason: tensor 'L['exp_logits']' size mismatch at index 0. expected 304, actual 336\n", + "i.finetune/0 [default0]:[rank0]:W1115 09:46:04.545000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] To log all recompilation reasons, use TORCH_LOGS=\"recompiles\".\n", + "i.finetune/0 [default0]:[rank0]:W1115 09:46:04.545000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 11/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 11 | reduced_train_loss: 0.2758 | consumed_samples: 96 | val_loss: 0.1337\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 12/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 12 | reduced_train_loss: 0.206 | consumed_samples: 104 | val_loss: 0.1601\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 13/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 13 | reduced_train_loss: 0.1556 | consumed_samples: 112 | val_loss: 0.1601\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 14/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 14 | reduced_train_loss: 0.1831 | consumed_samples: 120 | val_loss: 0.1798\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 15/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 15 | reduced_train_loss: 0.1565 | consumed_samples: 128 | val_loss: 0.1798\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 16/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 16 | reduced_train_loss: 0.3776 | consumed_samples: 136 | val_loss: 0.2383\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 17/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 17 | reduced_train_loss: 0.483 | consumed_samples: 144 | val_loss: 0.2383\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 18/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 18 | reduced_train_loss: 0.188 | consumed_samples: 152 | val_loss: 0.2823\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 19/19 | lr: 0.0001 | global_batch_size: 8 | global_step: 19 | reduced_train_loss: 0.2591 | consumed_samples: 160 | val_loss: 0.2823\n", + "i.finetune/0 [default0]:Epoch 0, global step 19: 'reduced_train_loss' reached 0.25909 (best 0.25909), saving model to 'results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0.ckpt' as top 1\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:17 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0.ckpt\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:17 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0-last.ckpt\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:19 model_checkpoint:522] Async checkpoint save for step 20 (results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0.ckpt) finalized successfully.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:46:19 model_checkpoint:522] Async checkpoint save for step 20 (results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0-last.ckpt) finalized successfully.\n", + "i.finetune/0 [default0]:`Trainer.fit` stopped: `max_steps=20` reached.\n", + "i.finetune/0 I1115 09:46:34.257000 140737350272832 torch/distributed/elastic/agent/server/api.py:844] [default] worker group successfully finished. Waiting 300 seconds for other agents to finish.\n", + "i.finetune/0 I1115 09:46:34.257000 140737350272832 torch/distributed/elastic/agent/server/api.py:889] Local worker group finished (WorkerState.SUCCEEDED). Waiting 300 seconds for other agents to finish\n", + "i.finetune/0 I1115 09:46:34.258000 140737350272832 torch/distributed/elastic/agent/server/api.py:902] Done waiting for other agents. Elapsed: 0.00025081634521484375 seconds\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Job nemo.collections.llm.api.finetune-wdj265kcplhnkd finished: SUCCEEDED\n" + ] + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# The experiment was run with the following tasks: ['nemo.collections.llm.api.finetune']                           \n",
+       "# You can inspect and reconstruct this experiment at a later point in time using:                                  \n",
+       "experiment = run.Experiment.from_id(\"nemo.collections.llm.api.finetune_1731692700\")                                \n",
+       "experiment.status() # Gets the overall status                                                                      \n",
+       "experiment.logs(\"nemo.collections.llm.api.finetune\") # Gets the log for the provided task                          \n",
+       "experiment.cancel(\"nemo.collections.llm.api.finetune\") # Cancels the provided task if still running                \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# The experiment was run with the following tasks: ['nemo.collections.llm.api.finetune']\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect and reconstruct this experiment at a later point in time using:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.finetune_1731692700\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the overall status\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.finetune\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the log for the provided task\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.finetune\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Cancels the provided task if still running\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# You can inspect this experiment at a later point in time using the CLI as well:                                  \n",
+       "nemo experiment status nemo.collections.llm.api.finetune_1731692700                                                \n",
+       "nemo experiment logs nemo.collections.llm.api.finetune_1731692700 0                                                \n",
+       "nemo experiment cancel nemo.collections.llm.api.finetune_1731692700 0                                              \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect this experiment at a later point in time using the CLI as well:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1731692700\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1731692700\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1731692700\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ - "from nemo import lightning as nl\n", - "from nemo.collections import llm\n", - "from megatron.core.optimizer import OptimizerConfig\n", - "import torch\n", - "import pytorch_lightning as pl\n", - "from pathlib import Path\n", - "\n", - "\n", - "def trainer(devices=1) -> nl.Trainer:\n", - " strategy = nl.MegatronStrategy(\n", - " tensor_model_parallel_size=1,\n", - " )\n", - "\n", - " return nl.Trainer(\n", - " devices=1,\n", - " max_steps=40,\n", - " accelerator=\"gpu\",\n", - " strategy=strategy,\n", - " plugins=nl.MegatronMixedPrecision(precision=\"bf16-mixed\"),\n", - " log_every_n_steps=1,\n", - " limit_val_batches=2,\n", - " val_check_interval=2,\n", - " num_sanity_val_steps=0,\n", - " )\n", - "\n", - "\n", - "def logger() -> nl.NeMoLogger:\n", - " ckpt = nl.ModelCheckpoint(\n", - " save_last=True,\n", - " every_n_train_steps=10,\n", - " monitor=\"reduced_train_loss\",\n", - " save_top_k=1,\n", - " save_on_train_epoch_end=True,\n", - " save_optim_on_train_end=True,\n", - " )\n", - "\n", - " return nl.NeMoLogger(\n", - " name=\"nemo2_peft\",\n", - " log_dir=\"./results\",\n", - " use_datetime_version=False,\n", - " ckpt=ckpt,\n", - " wandb=None\n", - " )\n", - "\n", - "\n", - "def adam_with_cosine_annealing() -> nl.OptimizerModule:\n", - " return nl.MegatronOptimizerModule(\n", - " config=OptimizerConfig(\n", - " optimizer=\"adam\",\n", - " lr=0.0001,\n", - " adam_beta2=0.98,\n", - " use_distributed_optimizer=True,\n", - " clip_grad=1.0,\n", - " bf16=True,\n", - " ),\n", + "def configure_finetuning_recipe():\n", + " return run.Partial(\n", + " llm.finetune,\n", + " model=llama3_8b(),\n", + " trainer=trainer(),\n", + " data=squad(),\n", + " log=logger(),\n", + " peft=lora(),\n", + " optim=adam_with_cosine_annealing(),\n", + " resume=resume(),\n", " )\n", "\n", - "def lora() -> nl.pytorch.callbacks.PEFT:\n", - " return llm.peft.LoRA()\n", - "\n", - "\n", - "\n", - "def squad() -> pl.LightningDataModule:\n", - " return llm.SquadDataModule(seq_length=2048, micro_batch_size=2, global_batch_size=8, num_workers=0)\n", - "\n", "\n", + "def local_executor_torchrun(nodes: int = 1, devices: int = 1) -> run.LocalExecutor:\n", + " # Env vars for jobs are configured here\n", + " env_vars = {\n", + " \"TORCH_NCCL_AVOID_RECORD_STREAMS\": \"1\",\n", + " \"NCCL_NVLS_ENABLE\": \"0\",\n", + " \"NVTE_DP_AMAX_REDUCE_INTERVAL\": \"0\",\n", + " \"NVTE_ASYNC_AMAX_REDUCTION\": \"1\",\n", + " \"NVTE_FUSED_ATTN\": \"0\",\n", + " }\n", "\n", - "def llama3_8b() -> pl.LightningModule:\n", - " from transformers import AutoTokenizer\n", - " tokenizer = AutoTokenizer.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\n", - " return llm.LlamaModel(llm.Llama3Config8B(), tokenizer=tokenizer)\n", + " executor = run.LocalExecutor(ntasks_per_node=devices, launcher=\"torchrun\", env_vars=env_vars)\n", "\n", - "def resume() -> nl.AutoResume:\n", - " return nl.AutoResume(\n", - " restore_config=nl.RestoreConfig(\n", - " path=\"hf://meta-llama/Meta-Llama-3-8B\"\n", - " ),\n", - " resume_if_exists=True,\n", - " )\n", + " return executor\n", "\n", "if __name__ == '__main__':\n", - " output_path=\"llama3-8b-nemo2\"\n", - " llm.import_ckpt(model=llama3_8b(), source=\"hf://meta-llama/Meta-Llama-3-8B\",output_path=Path(output_path))\n", - " llm.finetune(\n", - " model=llama3_8b(),\n", - " data=squad(),\n", - " trainer=trainer(),\n", - " peft=lora(),\n", - " log=logger(),\n", - " optim=adam_with_cosine_annealing(),\n", - " resume=resume(),\n", - " )" + " run.run(configure_finetuning_recipe(), executor=local_executor_torchrun())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Step 4 Evaluation ##TODO: depending on NeMo 2.0 llm generation API" + "## Step 4 Evaluation \n", + "\n", + "We use the `llm.generate` API in NeMo 2.0 to generate results from the trained PEFT checkpoint. Find your last saved checkpoint from your experiment dir: `results/nemo2_peft/checkpoints`. " + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "We will load PEFT checkpoint from: results/nemo2_peft/checkpoints/nemo2_peft--reduced_train_loss=0.2591-epoch=0-last\n" + ] + } + ], + "source": [ + "peft_ckpt_path=str(next((d for d in Path(\"./results/nemo2_peft/checkpoints/\").iterdir() if d.is_dir() and d.name.endswith(\"-last\")), None))\n", + "print(\"We will load PEFT checkpoint from:\", peft_ckpt_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Optional: Launch with [NeMo-Run](https://github.com/NVIDIA/NeMo-Run)\n", - "Alternatively, we could use launch PEFT jobs using existing [recipes](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes) from NeMo-Run. A recipe in NeMo is a python file that defines a complete configuration for training or fine-tuning an LLM. Each recipe typically includes:\n", - "1. Model configuration: Defines the architecture and hyperparameters of the LLM.\n", - "2. Training configuration: Specifies settings for the PyTorch Lightning Trainer, including distributed training strategies.\n", - "3. Data configuration: Sets up the data pipeline, including batch sizes and sequence lengths.\n", - "4. Optimization configuration: Defines the optimizer and learning rate schedule.\n", - "5. Logging and checkpointing configuration: Specifies how to save model checkpoints and log training progress.\n", + "SQuAD test set contains over 10,000 samples. For a quick demonstration, we will use the first 100 lines as an example input. " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Which NFL team represented the AFC at Super Bowl 50? Answer:\", \"output\": \"Denver Broncos\", \"original_answers\": [\"Denver Broncos\", \"Denver Broncos\", \"Denver Broncos\"]}\n", + "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Which NFL team represented the NFC at Super Bowl 50? Answer:\", \"output\": \"Carolina Panthers\", \"original_answers\": [\"Carolina Panthers\", \"Carolina Panthers\", \"Carolina Panthers\"]}\n", + "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Where did Super Bowl 50 take place? Answer:\", \"output\": \"Santa Clara, California\", \"original_answers\": [\"Santa Clara, California\", \"Levi's Stadium\", \"Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.\"]}\n" + ] + } + ], + "source": [ + "%%bash\n", + "head -n 100 /root/.cache/nemo/datasets/squad/test.jsonl > toy_testset.jsonl\n", + "head -n 3 /root/.cache/nemo/datasets/squad/test.jsonl" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will pass the string `toy_testset.jsonl` to the `input_dataset` parameter of `llm.generate`.To evaluate the entire test set, you can instead pass the SQuAD data module directly, using `input_dataset=squad()`. The input JSONL file should follow the format shown above, containing `input` and `output` fields (additional keys are optional)." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
─── Entering Experiment nemo.collections.llm.api.generate with id: nemo.collections.llm.api.generate_1731692795 ───\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m─── \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.generate with id: nemo.collections.llm.api.generate_1731692795\u001b[0m\u001b[92m ───\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate\n" + ] + }, + { + "data": { + "text/html": [ + "
[09:46:35] Launching job nemo.collections.llm.api.generate for experiment                         experiment.py:660\n",
+       "           nemo.collections.llm.api.generate                                                                       \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[2;36m[09:46:35]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching job nemo.collections.llm.api.generate for experiment \u001b[0m \u001b]8;id=926482;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=513888;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\u001b\\\u001b[2m660\u001b[0m\u001b]8;;\u001b\\\n", + "\u001b[2;36m \u001b[0m\u001b[1;36mnemo.collections.llm.api.generate\u001b[0m \u001b[2m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate\n", + "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.generate-zhfd0nk1lqmhm\n", + "AppStatus:\n", + " State: RUNNING\n", + " Num Restarts: 0\n", + " Roles: \n", + " Msg: \n", + " Structured Error Msg: \n", + " UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate/nemo_run/nemo.collections.llm.api.generate-zhfd0nk1lqmhm\n", + " \n" + ] + }, + { + "data": { + "text/html": [ + "
────────────────── Waiting for Experiment nemo.collections.llm.api.generate_1731692795 to finish ──────────────────\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m────────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.generate_1731692795 to finish\u001b[0m\u001b[92m ──────────────────\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
Experiment Status for nemo.collections.llm.api.generate_1731692795\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.generate_1731692795\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "Task 0: nemo.collections.llm.api.generate\n",
+       "- Status: RUNNING\n",
+       "- Executor: LocalExecutor\n",
+       "- Job id: nemo.collections.llm.api.generate-zhfd0nk1lqmhm\n",
+       "- Local Directory: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate\n",
+       "
\n" + ], + "text/plain": [ + "\n", + "\u001b[1;32mTask 0\u001b[0m: \u001b[1;38;5;214mnemo.collections.llm.api.generate\u001b[0m\n", + "- \u001b[1;32mStatus\u001b[0m: RUNNING\n", + "- \u001b[1;32mExecutor\u001b[0m: LocalExecutor\n", + "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.generate-zhfd0nk1lqmhm\n", + "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Waiting for job nemo.collections.llm.api.generate-zhfd0nk1lqmhm to finish [log=True]...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] Starting elastic_operator with launch configs:\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] entrypoint : nemo_run.core.runners.fdl_runner\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] min_nodes : 1\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] max_nodes : 1\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] nproc_per_node : 1\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] run_id : 4470\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_backend : c10d\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_endpoint : localhost:0\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_configs : {'timeout': 900}\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] max_restarts : 0\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] monitor_interval : 0.1\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] log_dir : /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731692795/nemo.collections.llm.api.generate/nemo_run/nemo.collections.llm.api.generate-zhfd0nk1lqmhm/torchelastic/nemo.collections.llm.api.generate\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] metrics_cfg : {}\n", + "i.generate/0 I1115 09:46:36.054000 140737350272832 torch/distributed/launcher/api.py:188] \n", + "i.generate/0 I1115 09:46:36.056000 140737350272832 torch/distributed/elastic/agent/server/api.py:825] [default] starting workers for entrypoint: python\n", + "i.generate/0 I1115 09:46:36.056000 140737350272832 torch/distributed/elastic/agent/server/api.py:646] [default] Rendezvous'ing worker group\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] [default] Rendezvous complete for workers. Result:\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] restart_count=0\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] master_addr=eos0346.eos.clusters.nvidia.com\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] master_port=53613\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] group_rank=0\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] group_world_size=1\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] local_ranks=[0]\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] role_ranks=[0]\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] global_ranks=[0]\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] role_world_sizes=[1]\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] global_world_sizes=[1]\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] \n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/api.py:654] [default] Starting worker group\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:184] Environment variable 'TORCHELASTIC_ENABLE_FILE_TIMER' not found. Do not start FileTimerServer.\n", + "i.generate/0 I1115 09:46:36.187000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:216] Environment variable 'TORCHELASTIC_HEALTH_CHECK_PORT' not found. Do not start health check.\n", + "i.generate/0 [default0]:/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n", + "i.generate/0 [default0]: warnings.warn(\n", + "i.generate/0 [default0]:[NeMo W 2024-11-15 09:46:42 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n", + "i.generate/0 [default0]: cm = get_cmap(\"Set1\")\n", + "i.generate/0 [default0]: \n", + "i.generate/0 [default0]:GPU available: True (cuda), used: True\n", + "i.generate/0 [default0]:TPU available: False, using: 0 TPU cores\n", + "i.generate/0 [default0]:HPU available: False, using: 0 HPUs\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:396] Rank 0 has data parallel group : [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:402] Rank 0 has combined group of data parallel and context parallel : [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:407] All data parallel group ranks with context parallel combined: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:410] Ranks 0 has data parallel rank: 0\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:418] Rank 0 has context parallel group: [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:421] All context parallel group ranks: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:422] Ranks 0 has context parallel rank: 0\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:429] Rank 0 has model parallel group: [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:430] All model parallel group ranks: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:439] Rank 0 has tensor model parallel group: [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:443] All tensor model parallel group ranks: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:444] Rank 0 has tensor model parallel rank: 0\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:464] Rank 0 has pipeline model parallel group: [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:476] Rank 0 has embedding group: [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:482] All pipeline model parallel group ranks: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:483] Rank 0 has pipeline model parallel rank 0\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:484] All embedding group ranks: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_init:485] Rank 0 has embedding rank: 0\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 base:44] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.\n", + "i.generate/0 [default0]:Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1\n", + "i.generate/0 [default0]:----------------------------------------------------------------------------------------------------\n", + "i.generate/0 [default0]:distributed_backend=nccl\n", + "i.generate/0 [default0]:All distributed processes registered. Starting with 1 processes\n", + "i.generate/0 [default0]:----------------------------------------------------------------------------------------------------\n", + "i.generate/0 [default0]:\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:44 megatron_parallel:550] > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 8030261248\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:45 megatron_strategy:745] Doing selective restore from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 megatron_strategy:750] Restoring model weights from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 megatron_strategy:757] Finished restoring from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True), cleaning up.\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.0.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.0.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.0.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.0.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.1.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.1.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.1.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.1.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.2.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.2.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.2.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.2.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.3.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.3.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.3.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.3.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.4.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.4.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.4.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.4.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.5.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.5.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.5.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.5.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.6.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.6.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.6.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.6.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.7.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.7.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.7.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.7.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.8.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.8.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.8.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.8.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.9.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.9.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.9.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.9.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.10.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.10.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.10.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.10.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.11.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.11.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.11.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.11.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.12.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.12.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.12.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.12.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.13.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.13.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.13.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.13.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.14.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.14.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.14.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.14.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.15.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.15.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.15.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.15.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.16.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.16.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.16.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.16.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.17.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.17.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.17.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.17.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.18.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.18.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.18.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.18.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.19.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.19.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.19.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.19.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.20.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.20.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.20.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.20.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.21.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.21.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.21.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.21.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.22.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.22.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.22.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.22.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.23.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.23.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.23.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.23.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.24.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.24.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.24.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.24.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.25.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.25.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.25.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.25.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.26.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.26.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.26.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.26.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.27.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.27.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.27.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.27.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.28.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.28.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.28.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.28.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.29.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.29.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.29.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.29.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.30.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.30.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.30.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.30.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.31.self_attention.linear_proj\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.31.self_attention.linear_qkv\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.31.mlp.linear_fc1\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:46:59 lora:227] Adding lora to: module.module.module.decoder.layers.31.mlp.linear_fc2\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 09:47:21 api:699] Predictions written to peft_prediction.jsonl\n", + "i.generate/0 I1115 09:47:24.254000 140737350272832 torch/distributed/elastic/agent/server/api.py:844] [default] worker group successfully finished. Waiting 300 seconds for other agents to finish.\n", + "i.generate/0 I1115 09:47:24.254000 140737350272832 torch/distributed/elastic/agent/server/api.py:889] Local worker group finished (WorkerState.SUCCEEDED). Waiting 300 seconds for other agents to finish\n", + "i.generate/0 I1115 09:47:24.254000 140737350272832 torch/distributed/elastic/agent/server/api.py:902] Done waiting for other agents. Elapsed: 0.0003161430358886719 seconds\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Job nemo.collections.llm.api.generate-zhfd0nk1lqmhm finished: SUCCEEDED\n" + ] + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# The experiment was run with the following tasks: ['nemo.collections.llm.api.generate']                           \n",
+       "# You can inspect and reconstruct this experiment at a later point in time using:                                  \n",
+       "experiment = run.Experiment.from_id(\"nemo.collections.llm.api.generate_1731692795\")                                \n",
+       "experiment.status() # Gets the overall status                                                                      \n",
+       "experiment.logs(\"nemo.collections.llm.api.generate\") # Gets the log for the provided task                          \n",
+       "experiment.cancel(\"nemo.collections.llm.api.generate\") # Cancels the provided task if still running                \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# The experiment was run with the following tasks: ['nemo.collections.llm.api.generate']\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect and reconstruct this experiment at a later point in time using:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.generate_1731692795\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the overall status\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.generate\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the log for the provided task\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.generate\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Cancels the provided task if still running\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# You can inspect this experiment at a later point in time using the CLI as well:                                  \n",
+       "nemo experiment status nemo.collections.llm.api.generate_1731692795                                                \n",
+       "nemo experiment logs nemo.collections.llm.api.generate_1731692795 0                                                \n",
+       "nemo experiment cancel nemo.collections.llm.api.generate_1731692795 0                                              \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect this experiment at a later point in time using the CLI as well:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.generate_1731692795\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.generate_1731692795\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.generate_1731692795\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from megatron.core.inference.common_inference_params import CommonInferenceParams\n", + "\n", + "\n", + "def trainer() -> run.Config[nl.Trainer]:\n", + " strategy = run.Config(\n", + " nl.MegatronStrategy,\n", + " tensor_model_parallel_size=1,\n", + " pipeline_model_parallel_size=1,\n", + " context_parallel_size=1,\n", + " sequence_parallel=False,\n", + " setup_optimizers=False,\n", + " store_optimizer_states=False,\n", + " )\n", + " trainer = run.Config(\n", + " nl.Trainer,\n", + " accelerator=\"gpu\",\n", + " devices=1,\n", + " num_nodes=1,\n", + " strategy=strategy,\n", + " plugins=bf16_mixed(),\n", + " )\n", + " return trainer\n", + "\n", + "def configure_inference():\n", + " return run.Partial(\n", + " llm.generate,\n", + " path=str(peft_ckpt_path),\n", + " trainer=trainer(),\n", + " input_dataset=\"toy_testset.jsonl\",\n", + " inference_params=CommonInferenceParams(num_tokens_to_generate=20, top_k=1),\n", + " output_path=\"peft_prediction.jsonl\",\n", + " )\n", + "\n", "\n", - "Recipes are designed to be modular and extensible, allowing users to easily customize settings for their specific use cases.\n", + "def local_executor_torchrun(nodes: int = 1, devices: int = 1) -> run.LocalExecutor:\n", + " # Env vars for jobs are configured here\n", + " env_vars = {\n", + " \"TORCH_NCCL_AVOID_RECORD_STREAMS\": \"1\",\n", + " \"NCCL_NVLS_ENABLE\": \"0\",\n", + " \"NVTE_DP_AMAX_REDUCE_INTERVAL\": \"0\",\n", + " \"NVTE_ASYNC_AMAX_REDUCTION\": \"1\",\n", + " \"NVTE_FUSED_ATTN\": \"0\",\n", + " }\n", "\n", + " executor = run.LocalExecutor(ntasks_per_node=devices, launcher=\"torchrun\", env_vars=env_vars)\n", "\n", - "NeMo-Run is a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across various computing environments. NeMo-Run is responsible for experiment configuration, execution and management. Here is an example for launch a recipe using NeMo-Run using local executor." + " return executor\n", + "\n", + "if __name__ == '__main__':\n", + " run.run(configure_inference(), executor=local_executor_torchrun())\n" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "## TODO: Pretrain with tp1pp1cp2 doesn't work. Pretrain with tp4pp1cp2 works. Finetuning recipe doesn't work\n", - "import nemo_run as run\n", - "from nemo.collections import llm\n", - "\n", - "recipe = llm.llama3_8b.finetune_recipe(name=\"llama3-8b-pretrain\", dir=\"exp/nemorun_ft\", num_nodes=1, num_gpus_per_node=2)\n", - "env_vars = {\n", - " \"TORCH_NCCL_AVOID_RECORD_STREAMS\": \"1\",\n", - " \"NCCL_NVLS_ENABLE\": \"0\",\n", - " \"NVTE_DP_AMAX_REDUCE_INTERVAL\": \"0\",\n", - " \"NVTE_ASYNC_AMAX_REDUCTION\": \"1\",\n", - " \"NVTE_FUSED_ATTN\": \"0\",\n", - "}\n", - "local_executor = run.LocalExecutor(ntasks_per_node=8, launcher=\"torchrun\", env_vars=env_vars)\n", - "run.run(recipe, executor=local_executor)" + "After the inference is complete, you will see results similar to the following:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Which NFL team represented the AFC at Super Bowl 50? Answer:\", \"original_answers\": [\"Denver Broncos\", \"Denver Broncos\", \"Denver Broncos\"], \"label\": \"Denver Broncos\", \"prediction\": \" Denver Broncos\"}\n", + "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Which NFL team represented the NFC at Super Bowl 50? Answer:\", \"original_answers\": [\"Carolina Panthers\", \"Carolina Panthers\", \"Carolina Panthers\"], \"label\": \"Carolina Panthers\", \"prediction\": \" Carolina Panthers\"}\n", + "{\"input\": \"Context: Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \\\"golden anniversary\\\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \\\"Super Bowl L\\\"), so that the logo could prominently feature the Arabic numerals 50. Question: Where did Super Bowl 50 take place? Answer:\", \"original_answers\": [\"Santa Clara, California\", \"Levi's Stadium\", \"Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.\"], \"label\": \"Santa Clara, California\", \"prediction\": \" Levi's Stadium\"}\n" + ] + } + ], + "source": [ + "%%bash\n", + "head -n 3 peft_prediction.jsonl" ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -491,9 +2044,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.10.12" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/tutorials/llm/nemo2-sft.ipynb b/tutorials/llm/nemo2-sft.ipynb index 87310d9d800c..667d074349c3 100644 --- a/tutorials/llm/nemo2-sft.ipynb +++ b/tutorials/llm/nemo2-sft.ipynb @@ -8,7 +8,7 @@ "\n", "## Supervised Finetuning (SFT)\n", "\n", - "Often we want to adapt or customize foundation models to be more performant on our specific task. Fine-tuning refers to how we can modify the weights of a pre-trained foundation model with additional custom data. Supervised fine-tuning (SFT) refers to unfreezing all the weights and layers in our model and training on a newly labeled set of examples. We can fine-tune to incorporate new, domain-specific knowledge, or teach the foundation model what type of response to provide. One specific type of SFT is also referred to as “instruction tuning” where we use SFT to teach a model to follow instructions better. In this playbook will demostrate how to perform SFT with Llama3-8b using NeMo 2.0.\n", + "Often we want to adapt or customize foundation models to be more performant on our specific task. Fine-tuning refers to how we can modify the weights of a pre-trained foundation model with additional custom data. Supervised fine-tuning (SFT) refers to unfreezing all the weights and layers in our model and training on a newly labeled set of examples. We can fine-tune to incorporate new, domain-specific knowledge, or teach the foundation model what type of response to provide. One specific type of SFT is also referred to as “instruction tuning” where we use SFT to teach a model to follow instructions better. In this playbook will demonstrate how to perform SFT with Llama3-8b using NeMo 2.0.\n", "\n", "## NeMo 2.0\n", "\n", @@ -18,16 +18,16 @@ "\n", "- Modular Abstractions - By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 simplifies adaptation and experimentation. This modular approach allows developers to more easily modify and experiment with different components of their models.\n", "\n", - "- Scalability - NeMo 2.0 seamlessly scaling large-scale experiments across thousands of GPUs using NeMo-Run, a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across computing environments.\n", + "- Scalability - NeMo 2.0 seamlessly scales large-scale experiments across thousands of GPUs using NeMo-Run, a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across computing environments.\n", "\n", "By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 makes it easy for users to adapt the framework to their specific use cases and experiment with various configurations. This section offers an overview of the new features in NeMo 2.0 and includes a migration guide with step-by-step instructions for transitioning your models from NeMo 1.0 to NeMo 2.0.\n", "\n", "\n", "## Software Requirements\n", "\n", - "1. Use the latest [NeMo Framework Training container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags) . Note that you must be logged in to the container registry to view this page.\n", + "1. Use the latest [NeMo Framework Training container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo/tags). Note that you must be logged in to the container registry to view this page.\n", "\n", - "2. This notebook uses the container: `nvcr.io/nvidia/nemo:dev` #TODO: FIX CONTAINER \n", + "2. This notebook uses the container: `nvcr.io/nvidia/nemo:dev` \n", "\n", "\n", "## Hardware Requirements\n", @@ -49,7 +49,7 @@ "source": [ "# Step 0: Go inside docker container\n", "\n", - "You can start and enter the dev container by: #TODO: FIX CONTAINER\n", + "You can start and enter the dev container by:\n", "```\n", "docker run --gpus device=1 --shm-size=2g --net=host --ulimit memlock=-1 --rm -it -v ${PWD}:/workspace -w /workspace -v ${PWD}/results:/results nvcr.io/nvidia/nemo:dev bash\n", "\n", @@ -62,36 +62,309 @@ "source": [ "\n", "# Step 1: Import HuggingFace checkpoint\n", - "First request download permission from Meta and Hugging Face. Login through `huggingface-cli` using your Huggingface token before importing llama3 models. \n", + "First request download permission from Meta and Hugging Face. Log in through `huggingface-cli` using your Huggingface token before importing llama3 models. \n", "\n", "```\n", "$ huggingface-cli login\n", "```\n", "\n", - "Once logged in, you can use the following script to import a Hugging Face model. Based on the provided model configuration (`Llama3-8b` in the example below), the `llm.import_ckpt` API will download the specified model using the \"hf://\" URL format. It will then convert the model into NeMo 2.0 format and store it at the given `output_path`." + "Once logged in, you can use the following script to import a Hugging Face model. Based on the provided model configuration (`Llama3-8b` in the example below), the `llm.import_ckpt` API will download the specified model using the \"hf://\" URL format. It will then convert the model into NeMo 2.0 format. \n" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "execution_count": 1, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/usr/local/lib/python3.10/dist-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n", + "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n", + " warnings.warn(\n", + "[NeMo W 2024-11-15 09:57:49 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n", + " cm = get_cmap(\"Set1\")\n", + " \n" + ] + }, + { + "data": { + "text/html": [ + "
Entering Experiment nemo.collections.llm.api.import_ckpt with id: nemo.collections.llm.api.import_ckpt_1731693…\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m─ \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.import_ckpt with id: nemo.collections.llm.api.import_ckpt_1731693…\u001b[0m\u001b[92m ─\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731693470/nemo.collections.llm.api.import_ckpt\n" + ] + }, + { + "data": { + "text/html": [ + "
[09:57:50] Launching job nemo.collections.llm.api.import_ckpt for experiment                      experiment.py:660\n",
+       "           nemo.collections.llm.api.import_ckpt                                                                    \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[2;36m[09:57:50]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching job nemo.collections.llm.api.import_ckpt for experiment \u001b[0m \u001b]8;id=974892;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=76903;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\u001b\\\u001b[2m660\u001b[0m\u001b]8;;\u001b\\\n", + "\u001b[2;36m \u001b[0m\u001b[1;36mnemo.collections.llm.api.import_ckpt\u001b[0m \u001b[2m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731693470/nemo.collections.llm.api.import_ckpt\n", + "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.import_ckpt-f0rwwn6vt74ckc\n", + "AppStatus:\n", + " State: RUNNING\n", + " Num Restarts: 0\n", + " Roles: \n", + " Msg: \n", + " Structured Error Msg: \n", + " UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731693470/nemo.collections.llm.api.import_ckpt/nemo_run/nemo.collections.llm.api.import_ckpt-f0rwwn6vt74ckc\n", + " \n" + ] + }, + { + "data": { + "text/html": [ + "
──────────────── Waiting for Experiment nemo.collections.llm.api.import_ckpt_1731693470 to finish ─────────────────\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m──────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.import_ckpt_1731693470 to finish\u001b[0m\u001b[92m ─────────────────\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
Experiment Status for nemo.collections.llm.api.import_ckpt_1731693470\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.import_ckpt_1731693470\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "Task 0: nemo.collections.llm.api.import_ckpt\n",
+       "- Status: RUNNING\n",
+       "- Executor: LocalExecutor\n",
+       "- Job id: nemo.collections.llm.api.import_ckpt-f0rwwn6vt74ckc\n",
+       "- Local Directory: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731693470/nemo.collections.llm.api.import_ckpt\n",
+       "
\n" + ], + "text/plain": [ + "\n", + "\u001b[1;32mTask 0\u001b[0m: \u001b[1;38;5;214mnemo.collections.llm.api.import_ckpt\u001b[0m\n", + "- \u001b[1;32mStatus\u001b[0m: RUNNING\n", + "- \u001b[1;32mExecutor\u001b[0m: LocalExecutor\n", + "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.import_ckpt-f0rwwn6vt74ckc\n", + "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.import_ckpt/nemo.collections.llm.api.import_ckpt_1731693470/nemo.collections.llm.api.import_ckpt\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Waiting for job nemo.collections.llm.api.import_ckpt-f0rwwn6vt74ckc to finish [log=True]...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "mport_ckpt/0 [NeMo W 2024-11-15 09:57:56 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n", + "mport_ckpt/0 cm = get_cmap(\"Set1\")\n", + "mport_ckpt/0 \n", + "mport_ckpt/0 Downloading shards: 100%|██████████| 4/4 [00:00<00:00, 4853.11it/s]\n", + "mport_ckpt/0 Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00, 3.24it/s]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_strategy:310] Fixing mis-match between ddp-config & mcore-optimizer config\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:396] Rank 0 has data parallel group : [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:402] Rank 0 has combined group of data parallel and context parallel : [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:407] All data parallel group ranks with context parallel combined: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:410] Ranks 0 has data parallel rank: 0\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:418] Rank 0 has context parallel group: [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:421] All context parallel group ranks: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:422] Ranks 0 has context parallel rank: 0\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:429] Rank 0 has model parallel group: [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:430] All model parallel group ranks: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:439] Rank 0 has tensor model parallel group: [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:443] All tensor model parallel group ranks: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:444] Rank 0 has tensor model parallel rank: 0\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:464] Rank 0 has pipeline model parallel group: [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:476] Rank 0 has embedding group: [0]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:482] All pipeline model parallel group ranks: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:483] Rank 0 has pipeline model parallel rank 0\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:484] All embedding group ranks: [[0]]\n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:57:59 megatron_init:485] Rank 0 has embedding rank: 0\n", + "mport_ckpt/0 GPU available: True (cuda), used: False\n", + "mport_ckpt/0 TPU available: False, using: 0 TPU cores\n", + "mport_ckpt/0 HPU available: False, using: 0 HPUs\n", + "mport_ckpt/0 [NeMo W 2024-11-15 09:57:59 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/setup.py:177: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.\n", + "mport_ckpt/0 \n", + "mport_ckpt/0 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1\n", + "mport_ckpt/0 ----------------------------------------------------------------------------------------------------\n", + "mport_ckpt/0 distributed_backend=gloo\n", + "mport_ckpt/0 All distributed processes registered. Starting with 1 processes\n", + "mport_ckpt/0 ----------------------------------------------------------------------------------------------------\n", + "mport_ckpt/0 \n", + "mport_ckpt/0 [NeMo I 2024-11-15 09:58:00 base:44] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.\n", + "mport_ckpt/0 [NeMo W 2024-11-15 09:58:00 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py:1090: `trainer.init_module` cannot fully support proper instantiation of your model with the `MegatronStrategy` strategy. Please instantiate your model inside the`LightningModule.configure_model` hook instead\n", + "mport_ckpt/0 \n", + "mport_ckpt/0 [NeMo W 2024-11-15 09:58:38 megatron_strategy:324] Could not copy Trainer's 'max_steps' to LR scheduler's 'max_steps'. If you are not using an LR scheduler, this warning can safely be ignored.\n", + "mport_ckpt/0 huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "mport_ckpt/0 To disable this warning, you can either:\n", + "mport_ckpt/0 \t- Avoid using `tokenizers` before the fork if possible\n", + "mport_ckpt/0 \t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "mport_ckpt/0 Converted Llama model to Nemo, model saved to /root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B in torch.bfloat16.\n", + "mport_ckpt/0 \u001b[32m $\u001b[0m\u001b[32mNEMO_MODELS_CACHE\u001b[0m\u001b[32m=\u001b[0m\u001b[32m/root/.cache/nemo/\u001b[0m\u001b[32mmodels\u001b[0m\u001b[32m \u001b[0m\n", + "mport_ckpt/0 \u001b[32m✓ Checkpoint imported to \u001b[0m\u001b[32m/root/.cache/nemo/models/meta-llama/\u001b[0m\u001b[32mMeta-Llama-3-8B\u001b[0m\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Job nemo.collections.llm.api.import_ckpt-f0rwwn6vt74ckc finished: SUCCEEDED\n" + ] + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# The experiment was run with the following tasks: ['nemo.collections.llm.api.import_ckpt']                        \n",
+       "# You can inspect and reconstruct this experiment at a later point in time using:                                  \n",
+       "experiment = run.Experiment.from_id(\"nemo.collections.llm.api.import_ckpt_1731693470\")                             \n",
+       "experiment.status() # Gets the overall status                                                                      \n",
+       "experiment.logs(\"nemo.collections.llm.api.import_ckpt\") # Gets the log for the provided task                       \n",
+       "experiment.cancel(\"nemo.collections.llm.api.import_ckpt\") # Cancels the provided task if still running             \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# The experiment was run with the following tasks: ['nemo.collections.llm.api.import_ckpt']\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect and reconstruct this experiment at a later point in time using:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731693470\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the overall status\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.import_ckpt\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the log for the provided task\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.import_ckpt\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Cancels the provided task if still running\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# You can inspect this experiment at a later point in time using the CLI as well:                                  \n",
+       "nemo experiment status nemo.collections.llm.api.import_ckpt_1731693470                                             \n",
+       "nemo experiment logs nemo.collections.llm.api.import_ckpt_1731693470 0                                             \n",
+       "nemo experiment cancel nemo.collections.llm.api.import_ckpt_1731693470 0                                           \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect this experiment at a later point in time using the CLI as well:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731693470\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731693470\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.import_ckpt_1731693470\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ + "import nemo_run as run\n", "from nemo import lightning as nl\n", "from nemo.collections import llm\n", "from megatron.core.optimizer import OptimizerConfig\n", "import torch\n", "import pytorch_lightning as pl\n", "from pathlib import Path\n", + "from nemo.collections.llm.recipes.precision.mixed_precision import bf16_mixed\n", + "\n", + "\n", + "# llm.import_ckpt is the nemo2 API for converting Hugging Face checkpoint to NeMo format\n", + "# example usage:\n", + "# llm.import_ckpt(model=llm.llama3_8b.model(), source=\"hf://meta-llama/Meta-Llama-3-8B\")\n", + "#\n", + "# We use run.Partial to configure this function\n", + "def configure_checkpoint_conversion():\n", + " return run.Partial(\n", + " llm.import_ckpt,\n", + " model=llm.llama3_8b.model(),\n", + " source=\"hf://meta-llama/Meta-Llama-3-8B\",\n", + " overwrite=False,\n", + " )\n", "\n", - "def llama3_8b() -> pl.LightningModule:\n", - " from transformers import AutoTokenizer\n", - " tokenizer = AutoTokenizer.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\n", - " return llm.LlamaModel(llm.Llama3Config8B(), tokenizer=tokenizer)\n", + "# configure your function\n", + "import_ckpt = configure_checkpoint_conversion()\n", + "# define your executor\n", + "local_executor = run.LocalExecutor()\n", "\n", - "if __name__ == '__main__':\n", - " output_path=\"llama3-8b-nemo2\"\n", - " llm.import_ckpt(model=llama3_8b(), source=\"hf://meta-llama/Meta-Llama-3-8B\",output_path=Path(output_path))" + "# run your experiment\n", + "run.run(import_ckpt, executor=local_executor)\n" ] }, { @@ -105,29 +378,32 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 2, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "\n", - "def dolly() -> pl.LightningDataModule:\n", - " return llm.DollyDataModule(seq_length=2048, micro_batch_size=2, global_batch_size=8, num_workers=0)" + "def dolly() -> run.Config[pl.LightningDataModule]:\n", + " return run.Config(llm.DollyDataModule, seq_length=2048, micro_batch_size=1, global_batch_size=8, num_workers=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "To use your own data, you will need to create a custom `DataModule`. This involvse extending the base class `FineTuningDataModule`, so that you have access to existing data handling logic such as packed sequence. Here we walk you through the process step by step using the already existing [`DollyDataModule`](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/gpt/data/dolly.py) as an example. \n", + "To use your own data, you will need to create a custom `DataModule`. This involves extending the base class `FineTuningDataModule`, so that you have access to existing data handling logic such as packed sequence. Here we walk you through the process step by step using the already existing [`DollyDataModule`](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/gpt/data/dolly.py) as an example. \n", "\n", "### 1. Subclass the FineTuningDataModule\n", - "You need to extend `FineTuningDataModule` if you're fine-tuning NeMo models. This provides access to existing data handling logic, such as packed sequences. The `data_root` parameter is where you store your generated `train/validation/test.jsonl` in NeMo format. How `DollyDataModule` does it:" + "You need to extend `FineTuningDataModule` if you're fine-tuning NeMo models. This provides access to existing data handling logic, such as packed sequences. The `data_root` parameter is where you store your generated `train/validation/test.jsonl` in NeMo format. Below is how `DollyDataModule` does it:" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 3, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "from datasets import load_dataset\n", @@ -184,14 +460,16 @@ "source": [ "### 2. Override the `prepare_data` Method\n", "\n", - "The prepare_data method is responsible for downloading and preprocessing data if needed. If the dataset is already downloaded, you can skip this step.\n", + "The `prepare_data` method is responsible for downloading and preprocessing data if needed. If the dataset is already downloaded, you can skip this step.\n", "\n" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 4, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "def prepare_data(self) -> None:\n", @@ -213,8 +491,10 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 5, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "def _download_data(self):\n", @@ -303,27 +583,30 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 6, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "\n", - "def trainer(devices=1) -> nl.Trainer:\n", - " strategy = nl.MegatronStrategy(\n", - " tensor_model_parallel_size=1,\n", + "def trainer() -> run.Config[nl.Trainer]:\n", + " strategy = run.Config(\n", + " nl.MegatronStrategy,\n", + " tensor_model_parallel_size=2\n", " )\n", - "\n", - " return nl.Trainer(\n", - " devices=devices,\n", - " max_steps=40,\n", + " trainer = run.Config(\n", + " nl.Trainer,\n", + " devices=2,\n", + " max_steps=20,\n", " accelerator=\"gpu\",\n", " strategy=strategy,\n", - " plugins=nl.MegatronMixedPrecision(precision=\"bf16-mixed\"),\n", + " plugins=bf16_mixed(),\n", " log_every_n_steps=1,\n", " limit_val_batches=2,\n", " val_check_interval=2,\n", " num_sanity_val_steps=0,\n", - " )\n" + " )\n", + " return trainer" ] }, { @@ -338,12 +621,15 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 7, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "def logger() -> nl.NeMoLogger:\n", - " ckpt = nl.ModelCheckpoint(\n", + "def logger() -> run.Config[nl.NeMoLogger]:\n", + " ckpt = run.Config(\n", + " nl.ModelCheckpoint,\n", " save_last=True,\n", " every_n_train_steps=10,\n", " monitor=\"reduced_train_loss\",\n", @@ -352,7 +638,8 @@ " save_optim_on_train_end=True,\n", " )\n", "\n", - " return nl.NeMoLogger(\n", + " return run.Config(\n", + " nl.NeMoLogger,\n", " name=\"nemo2_sft\",\n", " log_dir=\"./results\",\n", " use_datetime_version=False,\n", @@ -375,21 +662,26 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 8, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "def adam_with_cosine_annealing() -> nl.OptimizerModule:\n", - " return nl.MegatronOptimizerModule(\n", - " config=OptimizerConfig(\n", - " optimizer=\"adam\",\n", - " lr=0.0001,\n", - " adam_beta2=0.98,\n", - " use_distributed_optimizer=True,\n", - " clip_grad=1.0,\n", - " bf16=True,\n", - " ),\n", - " )" + "def adam_with_cosine_annealing() -> run.Config[nl.OptimizerModule]:\n", + " opt_cfg = run.Config(\n", + " OptimizerConfig,\n", + " optimizer=\"adam\",\n", + " lr=5e-6,\n", + " adam_beta2=0.98,\n", + " use_distributed_optimizer=True,\n", + " clip_grad=1.0,\n", + " bf16=True,\n", + " )\n", + " return run.Config(\n", + " nl.MegatronOptimizerModule,\n", + " config=opt_cfg\n", + " )\n" ] }, { @@ -402,14 +694,14 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 9, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "def llama3_8b() -> pl.LightningModule:\n", - " from transformers import AutoTokenizer\n", - " tokenizer = AutoTokenizer.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\n", - " return llm.LlamaModel(llm.Llama3Config8B(), tokenizer=tokenizer)" + "def llama3_8b() -> run.Config[pl.LightningModule]:\n", + " return run.Config(llm.LlamaModel, config=run.Config(llm.Llama3Config8B))" ] }, { @@ -422,15 +714,17 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, + "execution_count": 10, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "\n", - "def resume() -> nl.AutoResume:\n", - " return nl.AutoResume(\n", - " restore_config=nl.RestoreConfig(\n", - " path=\"hf://meta-llama/Meta-Llama-3-8B\"\n", + "def resume() -> run.Config[nl.AutoResume]:\n", + " return run.Config(\n", + " nl.AutoResume,\n", + " restore_config=run.Config(nl.RestoreConfig,\n", + " path=\"nemo://meta-llama/Meta-Llama-3-8B\"\n", " ),\n", " resume_if_exists=True,\n", " )" @@ -458,148 +752,1064 @@ }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "execution_count": 11, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
─── Entering Experiment nemo.collections.llm.api.finetune with id: nemo.collections.llm.api.finetune_1731693538 ───\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m─── \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.finetune with id: nemo.collections.llm.api.finetune_1731693538\u001b[0m\u001b[92m ───\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731693538/nemo.collections.llm.api.finetune\n" + ] + }, + { + "data": { + "text/html": [ + "
[09:58:58] Launching job nemo.collections.llm.api.finetune for experiment                         experiment.py:660\n",
+       "           nemo.collections.llm.api.finetune                                                                       \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[2;36m[09:58:58]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching job nemo.collections.llm.api.finetune for experiment \u001b[0m \u001b]8;id=475933;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=742588;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\u001b\\\u001b[2m660\u001b[0m\u001b]8;;\u001b\\\n", + "\u001b[2;36m \u001b[0m\u001b[1;36mnemo.collections.llm.api.finetune\u001b[0m \u001b[2m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731693538/nemo.collections.llm.api.finetune\n", + "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.finetune-bsqgzflc7xzftd\n", + "AppStatus:\n", + " State: RUNNING\n", + " Num Restarts: 0\n", + " Roles: \n", + " Msg: \n", + " Structured Error Msg: \n", + " UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731693538/nemo.collections.llm.api.finetune/nemo_run/nemo.collections.llm.api.finetune-bsqgzflc7xzftd\n", + " \n" + ] + }, + { + "data": { + "text/html": [ + "
────────────────── Waiting for Experiment nemo.collections.llm.api.finetune_1731693538 to finish ──────────────────\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m────────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.finetune_1731693538 to finish\u001b[0m\u001b[92m ──────────────────\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
Experiment Status for nemo.collections.llm.api.finetune_1731693538\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.finetune_1731693538\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "Task 0: nemo.collections.llm.api.finetune\n",
+       "- Status: RUNNING\n",
+       "- Executor: LocalExecutor\n",
+       "- Job id: nemo.collections.llm.api.finetune-bsqgzflc7xzftd\n",
+       "- Local Directory: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731693538/nemo.collections.llm.api.finetune\n",
+       "
\n" + ], + "text/plain": [ + "\n", + "\u001b[1;32mTask 0\u001b[0m: \u001b[1;38;5;214mnemo.collections.llm.api.finetune\u001b[0m\n", + "- \u001b[1;32mStatus\u001b[0m: RUNNING\n", + "- \u001b[1;32mExecutor\u001b[0m: LocalExecutor\n", + "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.finetune-bsqgzflc7xzftd\n", + "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731693538/nemo.collections.llm.api.finetune\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Waiting for job nemo.collections.llm.api.finetune-bsqgzflc7xzftd to finish [log=True]...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "i.finetune/0 W1115 09:58:59.485000 140737350272832 torch/distributed/run.py:778] \n", + "i.finetune/0 W1115 09:58:59.485000 140737350272832 torch/distributed/run.py:778] *****************************************\n", + "i.finetune/0 W1115 09:58:59.485000 140737350272832 torch/distributed/run.py:778] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. \n", + "i.finetune/0 W1115 09:58:59.485000 140737350272832 torch/distributed/run.py:778] *****************************************\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] Starting elastic_operator with launch configs:\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] entrypoint : nemo_run.core.runners.fdl_runner\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] min_nodes : 1\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] max_nodes : 1\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] nproc_per_node : 2\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] run_id : 9802\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_backend : c10d\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_endpoint : localhost:0\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_configs : {'timeout': 900}\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] max_restarts : 0\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] monitor_interval : 0.1\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] log_dir : /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1731693538/nemo.collections.llm.api.finetune/nemo_run/nemo.collections.llm.api.finetune-bsqgzflc7xzftd/torchelastic/nemo.collections.llm.api.finetune\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] metrics_cfg : {}\n", + "i.finetune/0 I1115 09:58:59.485000 140737350272832 torch/distributed/launcher/api.py:188] \n", + "i.finetune/0 I1115 09:58:59.488000 140737350272832 torch/distributed/elastic/agent/server/api.py:825] [default] starting workers for entrypoint: python\n", + "i.finetune/0 I1115 09:58:59.488000 140737350272832 torch/distributed/elastic/agent/server/api.py:646] [default] Rendezvous'ing worker group\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] [default] Rendezvous complete for workers. Result:\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] restart_count=0\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] master_addr=eos0346.eos.clusters.nvidia.com\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] master_port=51293\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] group_rank=0\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] group_world_size=1\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] local_ranks=[0, 1]\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] role_ranks=[0, 1]\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] global_ranks=[0, 1]\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] role_world_sizes=[2, 2]\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] global_world_sizes=[2, 2]\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] \n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/api.py:654] [default] Starting worker group\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:184] Environment variable 'TORCHELASTIC_ENABLE_FILE_TIMER' not found. Do not start FileTimerServer.\n", + "i.finetune/0 I1115 09:58:59.705000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:216] Environment variable 'TORCHELASTIC_HEALTH_CHECK_PORT' not found. Do not start health check.\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:59:06 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n", + "i.finetune/0 [default0]: cm = get_cmap(\"Set1\")\n", + "i.finetune/0 [default0]: \n", + "i.finetune/0 [default0]:GPU available: True (cuda), used: True\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:07 nemo_logger:145] Experiments will be logged at results/nemo2_sft\n", + "i.finetune/0 [default0]:TPU available: False, using: 0 TPU cores\n", + "i.finetune/0 [default0]:HPU available: False, using: 0 HPUs\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:59:07 nemo_logger:123] No version folders would be created under the log folder as 'resume_if_exists' is enabled.\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:59:07 nemo_logger:173] \"update_logger_directory\" is True. Overwriting tensorboard logger \"save_dir\" to results\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:59:07 nemo_logger:189] The Trainer already contains a ModelCheckpoint callback. This will be overwritten.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:07 megatron_strategy:310] Fixing mis-match between ddp-config & mcore-optimizer config\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:396] Rank 0 has data parallel group : [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:402] Rank 0 has combined group of data parallel and context parallel : [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:407] All data parallel group ranks with context parallel combined: [[0], [1]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:410] Ranks 0 has data parallel rank: 0\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:418] Rank 0 has context parallel group: [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:421] All context parallel group ranks: [[0], [1]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:422] Ranks 0 has context parallel rank: 0\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:429] Rank 0 has model parallel group: [0, 1]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:430] All model parallel group ranks: [[0, 1]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:439] Rank 0 has tensor model parallel group: [0, 1]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:443] All tensor model parallel group ranks: [[0, 1]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:444] Rank 0 has tensor model parallel rank: 0\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:464] Rank 0 has pipeline model parallel group: [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:476] Rank 0 has embedding group: [0]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:482] All pipeline model parallel group ranks: [[0], [1]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:483] Rank 0 has pipeline model parallel rank 0\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:484] All embedding group ranks: [[0], [1]]\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:08 megatron_init:485] Rank 0 has embedding rank: 0\n", + "i.finetune/0 [default0]:Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2\n", + "i.finetune/0 [default0]:----------------------------------------------------------------------------------------------------\n", + "i.finetune/0 [default0]:distributed_backend=nccl\n", + "i.finetune/0 [default0]:All distributed processes registered. Starting with 2 processes\n", + "i.finetune/0 [default0]:----------------------------------------------------------------------------------------------------\n", + "i.finetune/0 [default0]:\n", + "i.finetune/0 [default1]:Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:10 dolly:89] Downloading DollyDataModule...\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:11 dolly:97] Preprocessing DollyDataModule to jsonl format and splitting...\n", + "i.finetune/0 [default0]:\n", + "i.finetune/0 [default0]:Generating train split: 0%| | 0/15011 [00:00 number of parameters on (tensor, pipeline) model parallel rank (0, 0): 4015263744\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:13 utils:278] Setting up DistributedDataParallel with config DistributedDataParallelConfig(grad_reduce_in_fp32=True, overlap_grad_reduce=False, overlap_param_gather=False, align_param_gather=False, use_distributed_optimizer=True, check_for_nan_in_grad=True, bucket_size=None, average_in_collective=False, fp8_param_gather=False)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:13 utils:299] Number of buckets for gradient all-reduce / reduce-scatter: 1\n", + "i.finetune/0 [default0]: Params for bucket 1 (4015263744 elements):\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.final_layernorm.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.embedding.word_embeddings.weight\n", + "i.finetune/0 [default0]: \tmodule.output_layer.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.21.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.29.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.15.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.5.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.24.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.14.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.8.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.31.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.23.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.17.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.28.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.7.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.4.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.1.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.30.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.26.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.16.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.13.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.10.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.18.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.0.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.25.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.22.mlp.linear_fc1.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.19.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.12.mlp.linear_fc1.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.9.mlp.linear_fc2.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.2.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.27.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.20.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.11.self_attention.linear_proj.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.6.self_attention.linear_qkv.weight\n", + "i.finetune/0 [default0]: \tmodule.decoder.layers.3.self_attention.linear_qkv.layer_norm_weight\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:13 utils:278] Setting up optimizer with config OptimizerConfig(optimizer='adam', lr=5e-06, min_lr=None, decoupled_lr=None, decoupled_min_lr=None, weight_decay=0.01, fp16=False, bf16=True, params_dtype=torch.bfloat16, loss_scale=None, initial_loss_scale=4294967296, min_loss_scale=1.0, loss_scale_window=1000, hysteresis=2, adam_beta1=0.9, adam_beta2=0.98, adam_eps=1e-08, sgd_momentum=0.9, use_distributed_optimizer=True, overlap_param_gather_with_optimizer_step=False, clip_grad=1.0, log_num_zeros_in_grad=False, barrier_with_L1_time=False, timers=None, config_logger_dir='')\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:13 megatron_strategy:745] Doing selective restore from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:29 megatron_strategy:750] Restoring model weights from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 megatron_strategy:757] Finished restoring from RestoreConfig(path='/root/.cache/nemo/models/meta-llama/Meta-Llama-3-8B', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True), cleaning up.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:116] Building data files\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:528] Processing 1 data files using 1 workers\n", + "i.finetune/0 [default0]:\n", + "i.finetune/0 [default0]: | Name | Type | Params | Mode \n", + "i.finetune/0 [default0]:----------------------------------------\n", + "i.finetune/0 [default0]:0 | module | DDP | 4.0 B | train\n", + "i.finetune/0 [default0]:----------------------------------------\n", + "i.finetune/0 [default0]:4.0 B Trainable params\n", + "i.finetune/0 [default0]:0 Non-trainable params\n", + "i.finetune/0 [default0]:4.0 B Total params\n", + "i.finetune/0 [default0]:16,061.055Total estimated model params size (MB)\n", + "i.finetune/0 [default0]:651 Modules in train mode\n", + "i.finetune/0 [default0]:0 Modules in eval mode\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:494] Building indexing for fn = /root/.cache/nemo/datasets/dolly/training.jsonl\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:506] Saving idx file = /root/.cache/nemo/datasets/dolly/training.jsonl.idx.npy\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:508] Saving metadata file = /root/.cache/nemo/datasets/dolly/training.jsonl.idx.info\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:543] Time building 1 / 1 mem-mapped files: 0:00:00.073210\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:528] Processing 1 data files using 1 workers\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:543] Time building 0 / 1 mem-mapped files: 0:00:00.049917\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:158] Loading data files\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:249] Loading /root/.cache/nemo/datasets/dolly/training.jsonl\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000408\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:165] Computing global indices\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:116] Building data files\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:528] Processing 1 data files using 1 workers\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:494] Building indexing for fn = /root/.cache/nemo/datasets/dolly/validation.jsonl\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:506] Saving idx file = /root/.cache/nemo/datasets/dolly/validation.jsonl.idx.npy\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:508] Saving metadata file = /root/.cache/nemo/datasets/dolly/validation.jsonl.idx.info\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:543] Time building 1 / 1 mem-mapped files: 0:00:00.049564\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:528] Processing 1 data files using 1 workers\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:59:30 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=111` in the `DataLoader` to improve performance.\n", + "i.finetune/0 [default0]: \n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:543] Time building 0 / 1 mem-mapped files: 0:00:00.041159\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:158] Loading data files\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:249] Loading /root/.cache/nemo/datasets/dolly/validation.jsonl\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000357\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 09:59:30 text_memmap_dataset:165] Computing global indices\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:59:30 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=111` in the `DataLoader` to improve performance.\n", + "i.finetune/0 [default0]: \n", + "i.finetune/0 [default0]:Training epoch 0, iteration 0/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 0 | reduced_train_loss: 2.103\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 1/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 1 | reduced_train_loss: 1.272 | consumed_samples: 16\n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:59:59 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:431: It is recommended to use `self.log('global_batch_size', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.\n", + "i.finetune/0 [default0]: \n", + "i.finetune/0 [default0]:[NeMo W 2024-11-15 09:59:59 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:431: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.\n", + "i.finetune/0 [default0]: \n", + "i.finetune/0 [default0]:Training epoch 0, iteration 2/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 2 | reduced_train_loss: 1.512 | consumed_samples: 24 | val_loss: 2.103\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 3/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 3 | reduced_train_loss: 1.811 | consumed_samples: 32 | val_loss: 2.103\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 4/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 4 | reduced_train_loss: 1.398 | consumed_samples: 40 | val_loss: 2.029\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 5/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 5 | reduced_train_loss: 1.601 | consumed_samples: 48 | val_loss: 2.029\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 6/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 6 | reduced_train_loss: 1.075 | consumed_samples: 56 | val_loss: 2.005\n", + "i.finetune/0 [default1]:[rank1]:W1115 10:00:09.702000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] torch._dynamo hit config.cache_size_limit (8)\n", + "i.finetune/0 [default1]:[rank1]:W1115 10:00:09.702000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] function: 'calculate_cross_entropy_loss' (/opt/megatron-lm/megatron/core/fusions/fused_cross_entropy.py:47)\n", + "i.finetune/0 [default1]:[rank1]:W1115 10:00:09.702000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] last reason: tensor 'L['exp_logits']' size mismatch at index 0. expected 496, actual 512\n", + "i.finetune/0 [default1]:[rank1]:W1115 10:00:09.702000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] To log all recompilation reasons, use TORCH_LOGS=\"recompiles\".\n", + "i.finetune/0 [default1]:[rank1]:W1115 10:00:09.702000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.\n", + "i.finetune/0 [default0]:[rank0]:W1115 10:00:09.710000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] torch._dynamo hit config.cache_size_limit (8)\n", + "i.finetune/0 [default0]:[rank0]:W1115 10:00:09.710000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] function: 'calculate_cross_entropy_loss' (/opt/megatron-lm/megatron/core/fusions/fused_cross_entropy.py:47)\n", + "i.finetune/0 [default0]:[rank0]:W1115 10:00:09.710000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] last reason: tensor 'L['exp_logits']' size mismatch at index 0. expected 496, actual 512\n", + "i.finetune/0 [default0]:[rank0]:W1115 10:00:09.710000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] To log all recompilation reasons, use TORCH_LOGS=\"recompiles\".\n", + "i.finetune/0 [default0]:[rank0]:W1115 10:00:09.710000 140737350272832 torch/_dynamo/convert_frame.py:744] [4/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html.\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 7/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 7 | reduced_train_loss: 1.542 | consumed_samples: 64 | val_loss: 2.005\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 8/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 8 | reduced_train_loss: 1.479 | consumed_samples: 72 | val_loss: 2.003\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 9/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 9 | reduced_train_loss: 1.671 | consumed_samples: 80 | val_loss: 2.003\n", + "i.finetune/0 [default0]:Epoch 0, global step 9: 'reduced_train_loss' reached 1.67107 (best 1.67107), saving model to 'results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.6711-epoch=0.ckpt' as top 1\n", + "i.finetune/0 [default1]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default1]:To disable this warning, you can either:\n", + "i.finetune/0 [default1]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default1]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 10:00:35 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.6711-epoch=0.ckpt\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default1]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default1]:To disable this warning, you can either:\n", + "i.finetune/0 [default1]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default1]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 10:00:54 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.6711-epoch=0-last.ckpt\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 10/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 10 | reduced_train_loss: 1.458 | consumed_samples: 88 | val_loss: 1.958\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 11/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 11 | reduced_train_loss: 2.787 | consumed_samples: 96 | val_loss: 1.958\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 12/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 12 | reduced_train_loss: 1.427 | consumed_samples: 104 | val_loss: 1.933\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 13/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 13 | reduced_train_loss: 1.514 | consumed_samples: 112 | val_loss: 1.933\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 14/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 14 | reduced_train_loss: 1.127 | consumed_samples: 120 | val_loss: 1.925\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 15/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 15 | reduced_train_loss: 1.41 | consumed_samples: 128 | val_loss: 1.925\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 16/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 16 | reduced_train_loss: 1.075 | consumed_samples: 136 | val_loss: 1.923\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 17/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 17 | reduced_train_loss: 1.445 | consumed_samples: 144 | val_loss: 1.923\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 18/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 18 | reduced_train_loss: 1.711 | consumed_samples: 152 | val_loss: 1.929\n", + "i.finetune/0 [default0]:Training epoch 0, iteration 19/19 | lr: 5e-06 | global_batch_size: 8 | global_step: 19 | reduced_train_loss: 1.506 | consumed_samples: 160 | val_loss: 1.929\n", + "i.finetune/0 [default0]:Epoch 0, global step 19: 'reduced_train_loss' reached 1.50632 (best 1.50632), saving model to 'results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.5063-epoch=0.ckpt' as top 1\n", + "i.finetune/0 [default1]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default1]:To disable this warning, you can either:\n", + "i.finetune/0 [default1]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default1]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 10:01:26 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.5063-epoch=0.ckpt\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default1]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default1]:To disable this warning, you can either:\n", + "i.finetune/0 [default1]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default1]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", + "i.finetune/0 [default0]:To disable this warning, you can either:\n", + "i.finetune/0 [default0]:\t- Avoid using `tokenizers` before the fork if possible\n", + "i.finetune/0 [default0]:\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 10:01:46 model_checkpoint:497] Scheduled async checkpoint save for results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.5063-epoch=0-last.ckpt\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 10:01:46 model_checkpoint:522] Async checkpoint save for step 10 (results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.6711-epoch=0.ckpt) finalized successfully.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 10:01:46 model_checkpoint:522] Async checkpoint save for step 10 (results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.6711-epoch=0-last.ckpt) finalized successfully.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 10:01:47 dist_ckpt_io:174] Pending async checkpoint saves. Finalizing them synchronously now\n", + "i.finetune/0 [default0]:`Trainer.fit` stopped: `max_steps=20` reached.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 10:02:00 model_checkpoint:522] Async checkpoint save for step 20 (results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.5063-epoch=0.ckpt) finalized successfully.\n", + "i.finetune/0 [default0]:[NeMo I 2024-11-15 10:02:22 model_checkpoint:522] Async checkpoint save for step 20 (results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.5063-epoch=0-last.ckpt) finalized successfully.\n", + "i.finetune/0 I1115 10:03:41.584000 140737350272832 torch/distributed/elastic/agent/server/api.py:844] [default] worker group successfully finished. Waiting 300 seconds for other agents to finish.\n", + "i.finetune/0 I1115 10:03:41.584000 140737350272832 torch/distributed/elastic/agent/server/api.py:889] Local worker group finished (WorkerState.SUCCEEDED). Waiting 300 seconds for other agents to finish\n", + "i.finetune/0 I1115 10:03:41.584000 140737350272832 torch/distributed/elastic/agent/server/api.py:902] Done waiting for other agents. Elapsed: 0.00024819374084472656 seconds\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Job nemo.collections.llm.api.finetune-bsqgzflc7xzftd finished: SUCCEEDED\n" + ] + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# The experiment was run with the following tasks: ['nemo.collections.llm.api.finetune']                           \n",
+       "# You can inspect and reconstruct this experiment at a later point in time using:                                  \n",
+       "experiment = run.Experiment.from_id(\"nemo.collections.llm.api.finetune_1731693538\")                                \n",
+       "experiment.status() # Gets the overall status                                                                      \n",
+       "experiment.logs(\"nemo.collections.llm.api.finetune\") # Gets the log for the provided task                          \n",
+       "experiment.cancel(\"nemo.collections.llm.api.finetune\") # Cancels the provided task if still running                \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# The experiment was run with the following tasks: ['nemo.collections.llm.api.finetune']\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect and reconstruct this experiment at a later point in time using:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.finetune_1731693538\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the overall status\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.finetune\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the log for the provided task\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.finetune\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Cancels the provided task if still running\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# You can inspect this experiment at a later point in time using the CLI as well:                                  \n",
+       "nemo experiment status nemo.collections.llm.api.finetune_1731693538                                                \n",
+       "nemo experiment logs nemo.collections.llm.api.finetune_1731693538 0                                                \n",
+       "nemo experiment cancel nemo.collections.llm.api.finetune_1731693538 0                                              \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect this experiment at a later point in time using the CLI as well:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1731693538\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1731693538\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.finetune_1731693538\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ - "from nemo import lightning as nl\n", - "from nemo.collections import llm\n", - "from megatron.core.optimizer import OptimizerConfig\n", - "import torch\n", - "import pytorch_lightning as pl\n", - "\n", - "\n", - "def trainer(devices=1) -> nl.Trainer:\n", - " strategy = nl.MegatronStrategy(\n", - " tensor_model_parallel_size=1,\n", - " )\n", - "\n", - " return nl.Trainer(\n", - " devices=devices,\n", - " max_steps=40,\n", - " accelerator=\"gpu\",\n", - " strategy=strategy,\n", - " plugins=nl.MegatronMixedPrecision(precision=\"bf16-mixed\"),\n", - " log_every_n_steps=1,\n", - " limit_val_batches=2,\n", - " val_check_interval=2,\n", - " num_sanity_val_steps=0,\n", - " )\n", - "\n", - "\n", - "def logger() -> nl.NeMoLogger:\n", - " ckpt = nl.ModelCheckpoint(\n", - " save_last=True,\n", - " every_n_train_steps=10,\n", - " monitor=\"reduced_train_loss\",\n", - " save_top_k=1,\n", - " save_on_train_epoch_end=True,\n", - " save_optim_on_train_end=True,\n", - " )\n", - "\n", - " return nl.NeMoLogger(\n", - " name=\"nemo2_sft\",\n", - " log_dir=\"./results\",\n", - " use_datetime_version=False,\n", - " ckpt=ckpt,\n", - " wandb=None\n", - " )\n", - "\n", - "\n", - "def adam_with_cosine_annealing() -> nl.OptimizerModule:\n", - " return nl.MegatronOptimizerModule(\n", - " config=OptimizerConfig(\n", - " optimizer=\"adam\",\n", - " lr=0.0001,\n", - " adam_beta2=0.98,\n", - " use_distributed_optimizer=True,\n", - " clip_grad=1.0,\n", - " bf16=True,\n", - " ),\n", + "def configure_finetuning_recipe():\n", + " return run.Partial(\n", + " llm.finetune,\n", + " model=llama3_8b(),\n", + " trainer=trainer(),\n", + " data=dolly(),\n", + " log=logger(),\n", + " optim=adam_with_cosine_annealing(),\n", + " resume=resume(),\n", " )\n", "\n", - "def dolly() -> pl.LightningDataModule:\n", - " return llm.DollyDataModule(seq_length=2048, micro_batch_size=2, global_batch_size=8, num_workers=0)\n", - "\n", "\n", + "def local_executor_torchrun(nodes: int = 1, devices: int = 2) -> run.LocalExecutor:\n", + " # Env vars for jobs are configured here\n", + " env_vars = {\n", + " \"TORCH_NCCL_AVOID_RECORD_STREAMS\": \"1\",\n", + " \"NCCL_NVLS_ENABLE\": \"0\",\n", + " \"NVTE_DP_AMAX_REDUCE_INTERVAL\": \"0\",\n", + " \"NVTE_ASYNC_AMAX_REDUCTION\": \"1\",\n", + " \"NVTE_FUSED_ATTN\": \"0\",\n", + " }\n", "\n", - "def llama3_8b() -> pl.LightningModule:\n", - " from transformers import AutoTokenizer\n", - " tokenizer = AutoTokenizer.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\n", - " return llm.LlamaModel(llm.Llama3Config8B(), tokenizer=tokenizer)\n", + " executor = run.LocalExecutor(ntasks_per_node=devices, launcher=\"torchrun\", env_vars=env_vars)\n", "\n", - "def resume() -> nl.AutoResume:\n", - " return nl.AutoResume(\n", - " restore_config=nl.RestoreConfig(\n", - " path=\"hf://meta-llama/Meta-Llama-3-8B\"\n", - " ),\n", - " resume_if_exists=True,\n", - " )\n", + " return executor\n", "\n", "if __name__ == '__main__':\n", - " output_path=\"llama3-8b-nemo2\"\n", - " llm.import_ckpt(model=llama3_8b(), source=\"hf://meta-llama/Meta-Llama-3-8B\",output_path=Path(output_path))\n", - " llm.finetune(\n", - " model=llama3_8b(),\n", - " data=dolly(),\n", - " trainer=trainer(),\n", - " log=logger(),\n", - " optim=adam_with_cosine_annealing(),\n", - " resume=resume(),\n", - " )" + " run.run(configure_finetuning_recipe(), executor=local_executor_torchrun())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Step 4 Evaluation ##TODO: depending on NeMo 2.0 llm generation API" + "## Step 4 Evaluation\n", + "\n", + "We use the `llm.generate` API in NeMo 2.0 to generate results from the trained SFT checkpoint. Find your last saved checkpoint from your experiment dir: `results/nemo2_sft/checkpoints`. " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "We will load SFT checkpoint from: results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.5063-epoch=0-last\n" + ] + } + ], + "source": [ + "sft_ckpt_path=str(next((d for d in Path(\"./results/nemo2_sft/checkpoints/\").iterdir() if d.is_dir() and d.name.endswith(\"-last\")), None))\n", + "print(\"We will load SFT checkpoint from:\", sft_ckpt_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Optional: Launch with [NeMo-Run](https://github.com/NVIDIA/NeMo-Run)\n", - "Alternatively, we could use launch SFT jobs using existing [recipes](https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes) from NeMo-Run. A recipe in NeMo is a python file that defines a complete configuration for training or fine-tuning an LLM. Each recipe typically includes:\n", - "1. Model configuration: Defines the architecture and hyperparameters of the LLM.\n", - "2. Training configuration: Specifies settings for the PyTorch Lightning Trainer, including distributed training strategies.\n", - "3. Data configuration: Sets up the data pipeline, including batch sizes and sequence lengths.\n", - "4. Optimization configuration: Defines the optimizer and learning rate schedule.\n", - "5. Logging and checkpointing configuration: Specifies how to save model checkpoints and log training progress.\n", + "When using `llm.generate` API, you can pass a data module such as dolly: `input_dataset=dolly()`. This will use the test set from the specified data module to generate predictions. In the following example, the generated predictions are saved to the `sft_predictions.txt` file. Note that while fine-tuning required `tensor_model_parallel_size=2` minimum 2 GPUs, generating predictions only requires `tensor_model_parallel_size=1`. However, using multiple GPUs can speed up the inference process." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/html": [ + "
─── Entering Experiment nemo.collections.llm.api.generate with id: nemo.collections.llm.api.generate_1731693822 ───\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m─── \u001b[0m\u001b[1;35mEntering Experiment nemo.collections.llm.api.generate with id: nemo.collections.llm.api.generate_1731693822\u001b[0m\u001b[92m ───\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731693822/nemo.collections.llm.api.generate\n" + ] + }, + { + "data": { + "text/html": [ + "
[10:03:42] Launching job nemo.collections.llm.api.generate for experiment                         experiment.py:660\n",
+       "           nemo.collections.llm.api.generate                                                                       \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[2;36m[10:03:42]\u001b[0m\u001b[2;36m \u001b[0m\u001b[1;36mLaunching job nemo.collections.llm.api.generate for experiment \u001b[0m \u001b]8;id=991202;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py\u001b\\\u001b[2mexperiment.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=921254;file:///opt/NeMo-Run/src/nemo_run/run/experiment.py#660\u001b\\\u001b[2m660\u001b[0m\u001b]8;;\u001b\\\n", + "\u001b[2;36m \u001b[0m\u001b[1;36mnemo.collections.llm.api.generate\u001b[0m \u001b[2m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731693822/nemo.collections.llm.api.generate\n", + "Launched app: local_persistent://nemo_run/nemo.collections.llm.api.generate-lzdnjbxr7thbv\n", + "AppStatus:\n", + " State: RUNNING\n", + " Num Restarts: 0\n", + " Roles: \n", + " Msg: \n", + " Structured Error Msg: \n", + " UI URL: file:///root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731693822/nemo.collections.llm.api.generate/nemo_run/nemo.collections.llm.api.generate-lzdnjbxr7thbv\n", + " \n" + ] + }, + { + "data": { + "text/html": [ + "
────────────────── Waiting for Experiment nemo.collections.llm.api.generate_1731693822 to finish ──────────────────\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[92m────────────────── \u001b[0m\u001b[1;35mWaiting for Experiment nemo.collections.llm.api.generate_1731693822 to finish\u001b[0m\u001b[92m ──────────────────\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
Experiment Status for nemo.collections.llm.api.generate_1731693822\n",
+       "
\n" + ], + "text/plain": [ + "\u001b[1;32mExperiment Status for\u001b[0m \u001b[1;38;5;214mnemo.collections.llm.api.generate_1731693822\u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "Task 0: nemo.collections.llm.api.generate\n",
+       "- Status: RUNNING\n",
+       "- Executor: LocalExecutor\n",
+       "- Job id: nemo.collections.llm.api.generate-lzdnjbxr7thbv\n",
+       "- Local Directory: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731693822/nemo.collections.llm.api.generate\n",
+       "
\n" + ], + "text/plain": [ + "\n", + "\u001b[1;32mTask 0\u001b[0m: \u001b[1;38;5;214mnemo.collections.llm.api.generate\u001b[0m\n", + "- \u001b[1;32mStatus\u001b[0m: RUNNING\n", + "- \u001b[1;32mExecutor\u001b[0m: LocalExecutor\n", + "- \u001b[1;32mJob id\u001b[0m: nemo.collections.llm.api.generate-lzdnjbxr7thbv\n", + "- \u001b[1;32mLocal Directory\u001b[0m: /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731693822/nemo.collections.llm.api.generate\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n",
+       "
\n" + ], + "text/plain": [ + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Waiting for job nemo.collections.llm.api.generate-lzdnjbxr7thbv to finish [log=True]...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] Starting elastic_operator with launch configs:\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] entrypoint : nemo_run.core.runners.fdl_runner\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] min_nodes : 1\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] max_nodes : 1\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] nproc_per_node : 1\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] run_id : 159\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_backend : c10d\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_endpoint : localhost:0\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] rdzv_configs : {'timeout': 900}\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] max_restarts : 0\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] monitor_interval : 0.1\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] log_dir : /root/.nemo_run/experiments/nemo.collections.llm.api.generate/nemo.collections.llm.api.generate_1731693822/nemo.collections.llm.api.generate/nemo_run/nemo.collections.llm.api.generate-lzdnjbxr7thbv/torchelastic/nemo.collections.llm.api.generate\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] metrics_cfg : {}\n", + "i.generate/0 I1115 10:03:43.883000 140737350272832 torch/distributed/launcher/api.py:188] \n", + "i.generate/0 I1115 10:03:43.886000 140737350272832 torch/distributed/elastic/agent/server/api.py:825] [default] starting workers for entrypoint: python\n", + "i.generate/0 I1115 10:03:43.886000 140737350272832 torch/distributed/elastic/agent/server/api.py:646] [default] Rendezvous'ing worker group\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] [default] Rendezvous complete for workers. Result:\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] restart_count=0\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] master_addr=eos0346.eos.clusters.nvidia.com\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] master_port=51515\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] group_rank=0\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] group_world_size=1\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] local_ranks=[0]\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] role_ranks=[0]\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] global_ranks=[0]\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] role_world_sizes=[1]\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] global_world_sizes=[1]\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:512] \n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/api.py:654] [default] Starting worker group\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:184] Environment variable 'TORCHELASTIC_ENABLE_FILE_TIMER' not found. Do not start FileTimerServer.\n", + "i.generate/0 I1115 10:03:44.045000 140737350272832 torch/distributed/elastic/agent/server/local_elastic_agent.py:216] Environment variable 'TORCHELASTIC_HEALTH_CHECK_PORT' not found. Do not start health check.\n", + "i.generate/0 [default0]:/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n", + "i.generate/0 [default0]: warnings.warn(\n", + "i.generate/0 [default0]:[NeMo W 2024-11-15 10:03:50 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n", + "i.generate/0 [default0]: cm = get_cmap(\"Set1\")\n", + "i.generate/0 [default0]: \n", + "i.generate/0 [default0]:GPU available: True (cuda), used: True\n", + "i.generate/0 [default0]:TPU available: False, using: 0 TPU cores\n", + "i.generate/0 [default0]:HPU available: False, using: 0 HPUs\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:396] Rank 0 has data parallel group : [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:402] Rank 0 has combined group of data parallel and context parallel : [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:407] All data parallel group ranks with context parallel combined: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:410] Ranks 0 has data parallel rank: 0\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:418] Rank 0 has context parallel group: [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:421] All context parallel group ranks: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:422] Ranks 0 has context parallel rank: 0\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:429] Rank 0 has model parallel group: [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:430] All model parallel group ranks: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:439] Rank 0 has tensor model parallel group: [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:443] All tensor model parallel group ranks: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:444] Rank 0 has tensor model parallel rank: 0\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:464] Rank 0 has pipeline model parallel group: [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:476] Rank 0 has embedding group: [0]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:482] All pipeline model parallel group ranks: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:483] Rank 0 has pipeline model parallel rank 0\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:484] All embedding group ranks: [[0]]\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 megatron_init:485] Rank 0 has embedding rank: 0\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:52 base:44] Padded vocab_size: 128256, original vocab_size: 128256, dummy tokens: 0.\n", + "i.generate/0 [default0]:Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1\n", + "i.generate/0 [default0]:----------------------------------------------------------------------------------------------------\n", + "i.generate/0 [default0]:distributed_backend=nccl\n", + "i.generate/0 [default0]:All distributed processes registered. Starting with 1 processes\n", + "i.generate/0 [default0]:----------------------------------------------------------------------------------------------------\n", + "i.generate/0 [default0]:\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:53 megatron_parallel:550] > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 8030261248\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:03:53 megatron_strategy:745] Doing selective restore from RestoreConfig(path='results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.5063-epoch=0-last', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:04:04 megatron_strategy:750] Restoring model weights from RestoreConfig(path='results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.5063-epoch=0-last', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True)\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:04:04 megatron_strategy:757] Finished restoring from RestoreConfig(path='results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.5063-epoch=0-last', adapter_path=None, load_model_state=True, load_optim_state=False, load_artifacts=True), cleaning up.\n", + "i.generate/0 [default0]:[NeMo W 2024-11-15 10:04:04 mixin:742] Could not find .model.model_transform for in results/nemo2_sft/checkpoints/nemo2_sft--reduced_train_loss=1.5063-epoch=0-last/context/io.json\n", + "i.generate/0 [default0]:GPU available: True (cuda), used: True\n", + "i.generate/0 [default0]:TPU available: False, using: 0 TPU cores\n", + "i.generate/0 [default0]:HPU available: False, using: 0 HPUs\n", + "i.generate/0 [default0]:[NeMo I 2024-11-15 10:30:50 api:699] Predictions written to sft_prediction.jsonl\n", + "i.generate/0 I1115 10:30:54.035000 140737350272832 torch/distributed/elastic/agent/server/api.py:844] [default] worker group successfully finished. Waiting 300 seconds for other agents to finish.\n", + "i.generate/0 I1115 10:30:54.036000 140737350272832 torch/distributed/elastic/agent/server/api.py:889] Local worker group finished (WorkerState.SUCCEEDED). Waiting 300 seconds for other agents to finish\n", + "i.generate/0 I1115 10:30:54.036000 140737350272832 torch/distributed/elastic/agent/server/api.py:902] Done waiting for other agents. Elapsed: 0.00029087066650390625 seconds\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Job nemo.collections.llm.api.generate-lzdnjbxr7thbv finished: SUCCEEDED\n" + ] + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# The experiment was run with the following tasks: ['nemo.collections.llm.api.generate']                           \n",
+       "# You can inspect and reconstruct this experiment at a later point in time using:                                  \n",
+       "experiment = run.Experiment.from_id(\"nemo.collections.llm.api.generate_1731693822\")                                \n",
+       "experiment.status() # Gets the overall status                                                                      \n",
+       "experiment.logs(\"nemo.collections.llm.api.generate\") # Gets the log for the provided task                          \n",
+       "experiment.cancel(\"nemo.collections.llm.api.generate\") # Cancels the provided task if still running                \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# The experiment was run with the following tasks: ['nemo.collections.llm.api.generate']\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect and reconstruct this experiment at a later point in time using:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mrun\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mExperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mfrom_id\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.generate_1731693822\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the overall status\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.generate\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Gets the log for the provided task\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m.\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mnemo.collections.llm.api.generate\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;149;144;119;48;2;39;40;34m# Cancels the provided task if still running\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
                                                                                                                   \n",
+       "# You can inspect this experiment at a later point in time using the CLI as well:                                  \n",
+       "nemo experiment status nemo.collections.llm.api.generate_1731693822                                                \n",
+       "nemo experiment logs nemo.collections.llm.api.generate_1731693822 0                                                \n",
+       "nemo experiment cancel nemo.collections.llm.api.generate_1731693822 0                                              \n",
+       "                                                                                                                   \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;149;144;119;48;2;39;40;34m# You can inspect this experiment at a later point in time using the CLI as well:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mstatus\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.generate_1731693822\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mlogs\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.generate_1731693822\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[38;2;248;248;242;48;2;39;40;34mnemo\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mexperiment\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcancel\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mnemo.collections.llm.api.generate_1731693822\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;174;129;255;48;2;39;40;34m0\u001b[0m\u001b[48;2;39;40;34m \u001b[0m\n", + "\u001b[48;2;39;40;34m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from megatron.core.inference.common_inference_params import CommonInferenceParams\n", + "\n", + "\n", + "def trainer() -> run.Config[nl.Trainer]:\n", + " strategy = run.Config(\n", + " nl.MegatronStrategy,\n", + " tensor_model_parallel_size=1,\n", + " pipeline_model_parallel_size=1,\n", + " context_parallel_size=1,\n", + " sequence_parallel=False,\n", + " setup_optimizers=False,\n", + " store_optimizer_states=False,\n", + " )\n", + " trainer = run.Config(\n", + " nl.Trainer,\n", + " accelerator=\"gpu\",\n", + " devices=1,\n", + " num_nodes=1,\n", + " strategy=strategy,\n", + " plugins=bf16_mixed(),\n", + " )\n", + " return trainer\n", + "\n", + "def configure_inference():\n", + " return run.Partial(\n", + " llm.generate,\n", + " path=str(sft_ckpt_path),\n", + " trainer=trainer(),\n", + " input_dataset=dolly(),\n", + " inference_params=CommonInferenceParams(num_tokens_to_generate=20, top_k=1),\n", + " output_path=\"sft_prediction.jsonl\",\n", + " )\n", + "\n", "\n", - "Recipes are designed to be modular and extensible, allowing users to easily customize settings for their specific use cases.\n", + "def local_executor_torchrun(nodes: int = 1, devices: int = 1) -> run.LocalExecutor:\n", + " # Env vars for jobs are configured here\n", + " env_vars = {\n", + " \"TORCH_NCCL_AVOID_RECORD_STREAMS\": \"1\",\n", + " \"NCCL_NVLS_ENABLE\": \"0\",\n", + " \"NVTE_DP_AMAX_REDUCE_INTERVAL\": \"0\",\n", + " \"NVTE_ASYNC_AMAX_REDUCTION\": \"1\",\n", + " \"NVTE_FUSED_ATTN\": \"0\",\n", + " }\n", "\n", + " executor = run.LocalExecutor(ntasks_per_node=devices, launcher=\"torchrun\", env_vars=env_vars)\n", "\n", - "NeMo-Run is a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across various computing environments. NeMo-Run is responsible for experiment configuration, execution and management. Here is an example for launch a recipe using NeMo-Run using local executor. ##TODO: the default finetuning recipe include LoRA" + " return executor\n", + "\n", + "if __name__ == '__main__':\n", + " run.run(configure_inference(), executor=local_executor_torchrun())\n" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "## TODO: Pretrain with tp1pp1cp2 doesn't work. Pretrain with tp4pp1cp2 works. Finetuning recipe doesn't work\n", - "import nemo_run as run\n", - "from nemo.collections import llm\n", - "\n", - "recipe = llm.llama3_8b.finetune_recipe(name=\"llama3-8b-pretrain\", dir=\"exp/nemorun_ft\", num_nodes=1, num_gpus_per_node=2)\n", - "env_vars = {\n", - " \"TORCH_NCCL_AVOID_RECORD_STREAMS\": \"1\",\n", - " \"NCCL_NVLS_ENABLE\": \"0\",\n", - " \"NVTE_DP_AMAX_REDUCE_INTERVAL\": \"0\",\n", - " \"NVTE_ASYNC_AMAX_REDUCTION\": \"1\",\n", - " \"NVTE_FUSED_ATTN\": \"0\",\n", - "}\n", - "local_executor = run.LocalExecutor(ntasks_per_node=8, launcher=\"torchrun\", env_vars=env_vars)\n", - "run.run(recipe, executor=local_executor)" + "After the inference is complete, you will see results similar to the following:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"input\": \"What is best creator's platform\", \"category\": \"brainstorming\", \"label\": \"Youtube. Youtube should be best creator platform\", \"prediction\": \" for video content creators. YouTube is best creator's platform for video content creators.\"}\n", + "{\"input\": \"When was the last time the Raiders won the Super Bowl?\", \"category\": \"open_qa\", \"label\": \"The Raiders have won three Super Bowl championships (1977, 1981, and 1984), one American Football League (AFL) championship (1967), and four American Football Conference (AFC) titles. The most recent Super Bowl ring was won in 1984 against the Washington Redskins of the NFC.\", \"prediction\": \" 2003\"}\n", + "{\"input\": \"Muckle Water is a long, narrow fresh water loch on Ward Hill on Rousay, Orkney, Scotland. It is the biggest loch on the island and is popular for fishing. It can be reached by a track from the roadside. The Suso Burn on the north eastern shore drains the loch into the Sound of Rousay.\\n\\nWhere is Muckle Water?\", \"category\": \"closed_qa\", \"label\": \"Muckle water is located in Rousay, Orkney, Scotland.\", \"prediction\": \" Muckle Water is a long, narrow fresh water loch on Ward Hill on Rousay,\"}\n" + ] + } + ], + "source": [ + "%%bash\n", + "head -n 3 sft_prediction.jsonl" ] } ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -613,9 +1823,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.13" + "version": "3.10.12" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 }