diff --git a/examples/llm/slimpajama/pretraining.ipynb b/examples/llm/slimpajama/pretraining.ipynb
index 9c16ac371e6f1..50484ee63c1a3 100644
--- a/examples/llm/slimpajama/pretraining.ipynb
+++ b/examples/llm/slimpajama/pretraining.ipynb
@@ -1,22 +1,19 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Pretraining using Slimpajama\n",
+    "\n",
+    "Let's see how we can use the data generated from the [data pipeline notebook](./data_pipeline.ipynb) to pretrain a model. All we need to do is define the data module based on the generated data and replace it with the mock data module provided by default in the [NeMo llm recipes](../../../nemo/collections/llm/recipes/__init__.py)."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/usr/local/lib/python3.10/dist-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
-      "  from .autonotebook import tqdm as notebook_tqdm\n",
-      "[NeMo W 2024-10-28 22:56:08 nemo_logging:361] /usr/local/lib/python3.10/dist-packages/pyannote/core/notebook.py:134: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed in 3.11. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap()`` or ``pyplot.get_cmap()`` instead.\n",
-      "      cm = get_cmap(\"Set1\")\n",
-      "    \n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "import nemo_run as run\n",
     "from typing import Optional\n",
@@ -25,6 +22,14 @@
     "from nemo.collections.common.tokenizers import SentencePieceTokenizer"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Defining the data module\n",
+    "To define the data module, we can use `llm.PreTrainingDataModule` and pass in the data paths and tokenizer. In case you don't have either of the two, please refer to [data pipeline notebook](./data_pipeline.ipynb). You can also look at the definition of the data module for the other parameters supported like `split`, `num_workers`, `index_mapping_dir`, etc."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 2,
@@ -44,12 +49,21 @@
     "        global_batch_size=gbs,\n",
     "        micro_batch_size=mbs,\n",
     "        tokenizer=run.Config(SentencePieceTokenizer, model_path=\"/data/tokenizer/tokenizer.model\"),\n",
-    "        split=\"99990,8,2\",\n",
+    "        split=\"99,8,2\",\n",
     "        num_workers=2,\n",
     "        index_mapping_dir=\"/data/index_mapping\",\n",
     "    )"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Configuring the recipe and launching pretraining\n",
+    "Once the data module is defined, you can use an existing recipe and replace the data module as shown below.\n",
+    "To learn more about the recipes, refer to the [quickstart](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html)."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 3,
@@ -108,6 +122,14 @@
     "    run.run(recipe, executor=executor)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Run pretraining\n",
+    "Now you can just call the `run_pretraining` function to start pretraining on your local machine using torchrun."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,