diff --git a/tutorials/multimodal/Multimodal Data Preparation.ipynb b/tutorials/multimodal/Multimodal Data Preparation.ipynb index 9f9e65b8a83d..473c3aa2e5ba 100644 --- a/tutorials/multimodal/Multimodal Data Preparation.ipynb +++ b/tutorials/multimodal/Multimodal Data Preparation.ipynb @@ -2,27 +2,19 @@ "cells": [ { "cell_type": "markdown", - "metadata": {}, "source": [ "# Multimodal Dataset Preparation\n", "\n", - "First step of pre-training any deep learning model is data preparation. This notebook will walk you through 5 stages of data preparation for training a multimodal model: \n", + "The first step of pre-training any deep learning model is data preparation. This notebook will walk you through the 5 stages of data preparation for training a multimodal model:\n", "1. Download your Data\n", "2. Extract Images and Text\n", "3. Re-organize to ensure uniform text-image pairs\n", "4. Precache Encodings\n", - "5. Generate Metadata required for training\n", - "\n", - "This notebook will show you how to prepare an image-text dataset into the [WebDataset](https://github.com/webdataset/webdataset) format. The Webdataset format is required to train all multimodal models in NeMo, such as Stable Diffusion and Imagen. \n", - "\n", - "This notebook is designed to demonstrate the different stages of multimodal dataset preparation. It is not meant to be used to process large-scale datasets since many stages are too time-consuming to run without parallelism. For large workloads, we recommend running the multimodal dataset preparation pipeline with the NeMo-Megatron-Launcher on multiple processors/GPUs. NeMo-Megatron-Launcher packs the same 5 scripts in this notebook into one runnable command and one config file to enable a smooth and a streamlined workflow.\n", - "\n", - "Depending on your use case, not all 5 stages need to be run. Please go to (TODO doc link) for an overview of the 5 stages.\n", - " \n", - "We will use a [dummy dataset](https://huggingface.co/datasets/cuichenx/dummy-image-text-dataset) as the dataset example throughout this notebook. This dataset is formatted as a table with one column storing the text captions, and one column storing the URL link to download the corresponding image. This is the same format as most common text-image datasets. The use of this dummy dataset is for demonstration purposes only. **Each user is responsible for checking the content of the dataset and the applicable licenses to determine if it is suitable for the intended use.**\n", - "\n", - "Let's first set up some paths." - ] + "5. Generate Metadata required for training\n" + ], + "metadata": { + "collapsed": false + } }, { "cell_type": "code", @@ -58,13 +50,12 @@ "id": "c06f3527", "metadata": {}, "source": [ - "# Multimodal Dataset Preparation\n", "\n", "This notebook will show you how to prepare an image-text dataset into the [WebDataset](https://github.com/webdataset/webdataset) format. The Webdataset format is required to train all multimodal models in NeMo, such as Stable Diffusion and Imagen. \n", "\n", "This notebook is designed to demonstrate the different stages of multimodal dataset preparation. It is not meant to be used to process large-scale datasets since many stages are too time-consuming to run without parallelism. For large workloads, we recommend running the multimodal dataset preparation pipeline with the NeMo-Megatron-Launcher on multiple processors/GPUs. NeMo-Megatron-Launcher packs the same 5 scripts in this notebook into one runnable command and one config file to enable a smooth and a streamlined workflow.\n", "\n", - "Depending on your use case, not all 5 stages need to be run. Please go to (TODO doc link) for an overview of the 5 stages.\n", + "Depending on your use case, not all 5 stages need to be run. Please go to [NeMo Multimodal Documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/multimodal/text2img/datasets.html) for an overview of the 5 stages.\n", " \n", "We will use a [dummy dataset](https://huggingface.co/datasets/cuichenx/dummy-image-text-dataset) as the dataset example throughout this notebook. This dataset is formatted as a table with one column storing the text captions, and one column storing the URL link to download the corresponding image. This is the same format as most common text-image datasets. The use of this dummy dataset is for demonstration purposes only. **Each user is responsible for checking the content of the dataset and the applicable licenses to determine if it is suitable for the intended use.**\n", "\n", @@ -413,7 +404,7 @@ "id": "27b26036", "metadata": {}, "source": [ - "Let's download an example precaching config file ## TODO modify this path" + "Let's download an example precaching config file" ] }, { @@ -425,7 +416,7 @@ }, "outputs": [], "source": [ - "! wget TODO_github_link/precache_sd.yaml -P $CONF_DIR/" + "! wget https://github.com/NVIDIA/NeMo-Megatron-Launcher/blob/master/launcher_scripts/conf/data_preparation/multimodal/precache_sd.yaml -P $CONF_DIR/" ] }, {