diff --git a/1_t5_small_single_gpu/1. T5-Small on Single GPU.ipynb b/1_t5_small_single_gpu/1. T5-Small on Single GPU.ipynb index 5485e59..fb0be2c 100644 --- a/1_t5_small_single_gpu/1. T5-Small on Single GPU.ipynb +++ b/1_t5_small_single_gpu/1. T5-Small on Single GPU.ipynb @@ -29,7 +29,7 @@ "\n", "The [t5 (text-to-text transfer transformer) family of models](https://blog.research.google/2020/02/exploring-transfer-learning-with-t5.html) was developed by Google Research. It was presented as an advancement over BERT-style models which could output only a class label or a span of the input. t5 allows the same model, loss, and hyperparameters to be used for *any* nlp task. t5 differs from GPT models because it is an encoder-decoder model, while GPT models are decoder-only models.\n", "\n", - "t5-small is a 60 million parameter model. A an oft-cited heuristic for model training is that you need GPU memory (VRAM) in Gigabytes greater than or equal to the number of parameters in billions times 16. So a 1 Billion parameter model would require approximately 16GB of VRAM for training. t5-small is a 0.06B parameter model and thus requires only around 0.96GB of VRAM for training. Again--we're starting *very small*.\n", + "t5-small is a 60 million parameter model. This is *small*: the smallest version of GPT2 has more than twice as many parameters (124M); llama2-7b, one of the most commonly-used models at the time of writing, has more than 116 times as many parameters (7B, hence the name). What does this mean for us? Parameter count strongly impacts the amount of memory required to train a model. Eleuther's [Transformer Math blog post](https://blog.eleuther.ai/transformer-math/#training) has a great overview of the memory costs associated with training models of different sizes. We'll get into this in more detail in a later notebook.\n", "\n", "## A few things to keep in mind\n", "Check out the [Readme](README.md) if you haven't already, as it provides important context for this whole project. If you're looking for a set of absolute best practices for how to train particular models, this isn't the place to find them (though I will link them when I come across them, and will try to make improvements where I can, as long as they don't come at the cost of extra complexity!). The goal is to develop a high-level understanding and intuition on model training and fine-tuning, so you can fairly quickly get to something that *works* and then iterate to make it work *better*.\n", @@ -43,7 +43,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -58,27 +58,121 @@ }, "outputs": [ { - "output_type": "stream", "name": "stdout", "output_type": "stream", "text": [ - "\u001B[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.\u001B[0m\nRequirement already satisfied: transformers in /databricks/python3/lib/python3.10/site-packages (4.34.0)\nCollecting transformers\n Downloading transformers-4.35.2-py3-none-any.whl (7.9 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.9/7.9 MB 21.9 MB/s eta 0:00:00\nRequirement already satisfied: torch in /databricks/python3/lib/python3.10/site-packages (2.0.1+cu118)\nCollecting torch\n Downloading torch-2.1.1-cp310-cp310-manylinux1_x86_64.whl (670.2 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 670.2/670.2 MB 1.9 MB/s eta 0:00:00\nRequirement already satisfied: accelerate in /databricks/python3/lib/python3.10/site-packages (0.23.0)\nCollecting accelerate\n Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 265.7/265.7 kB 31.9 MB/s eta 0:00:00\nRequirement already satisfied: packaging>=20.0 in /databricks/python3/lib/python3.10/site-packages (from transformers) (22.0)\nRequirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /databricks/python3/lib/python3.10/site-packages (from transformers) (0.16.4)\nRequirement already satisfied: filelock in /databricks/python3/lib/python3.10/site-packages (from transformers) (3.9.0)\nRequirement already satisfied: requests in /databricks/python3/lib/python3.10/site-packages (from transformers) (2.28.1)\nRequirement already satisfied: regex!=2019.12.17 in /databricks/python3/lib/python3.10/site-packages (from transformers) (2022.7.9)\nRequirement already satisfied: tqdm>=4.27 in /databricks/python3/lib/python3.10/site-packages (from transformers) (4.64.1)\nRequirement already satisfied: tokenizers<0.19,>=0.14 in /databricks/python3/lib/python3.10/site-packages (from transformers) (0.14.0)\nRequirement already satisfied: pyyaml>=5.1 in /databricks/python3/lib/python3.10/site-packages (from transformers) (6.0)\nRequirement already satisfied: safetensors>=0.3.1 in /databricks/python3/lib/python3.10/site-packages (from transformers) (0.4.0)\nRequirement already satisfied: numpy>=1.17 in /databricks/python3/lib/python3.10/site-packages (from transformers) (1.23.5)\nRequirement already satisfied: networkx in /databricks/python3/lib/python3.10/site-packages (from torch) (2.8.4)\nCollecting nvidia-nccl-cu12==2.18.1\n Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl (209.8 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.8/209.8 MB 3.4 MB/s eta 0:00:00\nCollecting nvidia-curand-cu12==10.3.2.106\n Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 10.8 MB/s eta 0:00:00\nRequirement already satisfied: jinja2 in /databricks/python3/lib/python3.10/site-packages (from torch) (3.1.2)\nCollecting nvidia-cuda-nvrtc-cu12==12.1.105\n Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 21.3 MB/s eta 0:00:00\nRequirement already satisfied: fsspec in /databricks/python3/lib/python3.10/site-packages (from torch) (2023.6.0)\nCollecting nvidia-cudnn-cu12==8.9.2.26\n Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n" + "\u001b[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.\u001b[0m\n", + "Requirement already satisfied: transformers in /databricks/python3/lib/python3.10/site-packages (4.34.0)\n", + "Collecting transformers\n", + " Downloading transformers-4.35.2-py3-none-any.whl (7.9 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.9/7.9 MB 21.9 MB/s eta 0:00:00\n", + "Requirement already satisfied: torch in /databricks/python3/lib/python3.10/site-packages (2.0.1+cu118)\n", + "Collecting torch\n", + " Downloading torch-2.1.1-cp310-cp310-manylinux1_x86_64.whl (670.2 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 670.2/670.2 MB 1.9 MB/s eta 0:00:00\n", + "Requirement already satisfied: accelerate in /databricks/python3/lib/python3.10/site-packages (0.23.0)\n", + "Collecting accelerate\n", + " Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 265.7/265.7 kB 31.9 MB/s eta 0:00:00\n", + "Requirement already satisfied: packaging>=20.0 in /databricks/python3/lib/python3.10/site-packages (from transformers) (22.0)\n", + "Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /databricks/python3/lib/python3.10/site-packages (from transformers) (0.16.4)\n", + "Requirement already satisfied: filelock in /databricks/python3/lib/python3.10/site-packages (from transformers) (3.9.0)\n", + "Requirement already satisfied: requests in /databricks/python3/lib/python3.10/site-packages (from transformers) (2.28.1)\n", + "Requirement already satisfied: regex!=2019.12.17 in /databricks/python3/lib/python3.10/site-packages (from transformers) (2022.7.9)\n", + "Requirement already satisfied: tqdm>=4.27 in /databricks/python3/lib/python3.10/site-packages (from transformers) (4.64.1)\n", + "Requirement already satisfied: tokenizers<0.19,>=0.14 in /databricks/python3/lib/python3.10/site-packages (from transformers) (0.14.0)\n", + "Requirement already satisfied: pyyaml>=5.1 in /databricks/python3/lib/python3.10/site-packages (from transformers) (6.0)\n", + "Requirement already satisfied: safetensors>=0.3.1 in /databricks/python3/lib/python3.10/site-packages (from transformers) (0.4.0)\n", + "Requirement already satisfied: numpy>=1.17 in /databricks/python3/lib/python3.10/site-packages (from transformers) (1.23.5)\n", + "Requirement already satisfied: networkx in /databricks/python3/lib/python3.10/site-packages (from torch) (2.8.4)\n", + "Collecting nvidia-nccl-cu12==2.18.1\n", + " Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl (209.8 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.8/209.8 MB 3.4 MB/s eta 0:00:00\n", + "Collecting nvidia-curand-cu12==10.3.2.106\n", + " Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 10.8 MB/s eta 0:00:00\n", + "Requirement already satisfied: jinja2 in /databricks/python3/lib/python3.10/site-packages (from torch) (3.1.2)\n", + "Collecting nvidia-cuda-nvrtc-cu12==12.1.105\n", + " Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 21.3 MB/s eta 0:00:00\n", + "Requirement already satisfied: fsspec in /databricks/python3/lib/python3.10/site-packages (from torch) (2023.6.0)\n", + "Collecting nvidia-cudnn-cu12==8.9.2.26\n", + " Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n" ] }, { - "output_type": "stream", "name": "stderr", "output_type": "stream", "text": [ - "2023-12-04 13:13:00.531046: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n2023-12-04 13:13:00.531105: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n2023-12-04 13:13:00.531127: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n2023-12-04 13:13:00.538527: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\nTo enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" + "2023-12-04 13:13:00.531046: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", + "2023-12-04 13:13:00.531105: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", + "2023-12-04 13:13:00.531127: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2023-12-04 13:13:00.538527: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", + "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] }, { - "output_type": "stream", "name": "stdout", "output_type": "stream", "text": [ - " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 1.8 MB/s eta 0:00:00\nCollecting nvidia-cusolver-cu12==11.4.5.107\n Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 19.2 MB/s eta 0:00:00\nCollecting triton==2.1.0\n Downloading triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.2/89.2 MB 26.7 MB/s eta 0:00:00\nRequirement already satisfied: typing-extensions in /databricks/python3/lib/python3.10/site-packages (from torch) (4.4.0)\nCollecting nvidia-nvtx-cu12==12.1.105\n Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 22.2 MB/s eta 0:00:00\nCollecting nvidia-cuda-cupti-cu12==12.1.105\n Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 91.7 MB/s eta 0:00:00\nRequirement already satisfied: sympy in /databricks/python3/lib/python3.10/site-packages (from torch) (1.11.1)\nCollecting nvidia-cufft-cu12==11.0.2.54\n Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 20.4 MB/s eta 0:00:00\nCollecting nvidia-cusparse-cu12==12.1.0.106\n Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 11.9 MB/s eta 0:00:00\nCollecting nvidia-cublas-cu12==12.1.3.1\n Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 3.6 MB/s eta 0:00:00\nCollecting nvidia-cuda-runtime-cu12==12.1.105\n Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 62.9 MB/s eta 0:00:00\nCollecting nvidia-nvjitlink-cu12\n Downloading nvidia_nvjitlink_cu12-12.3.101-py3-none-manylinux1_x86_64.whl (20.5 MB)\n ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.5/20.5 MB 83.3 MB/s eta 0:00:00\nRequirement already satisfied: psutil in /databricks/python3/lib/python3.10/site-packages (from accelerate) (5.9.0)\nRequirement already satisfied: MarkupSafe>=2.0 in /databricks/python3/lib/python3.10/site-packages (from jinja2->torch) (2.1.1)\nRequirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (2022.12.7)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (1.26.14)\nRequirement already satisfied: charset-normalizer<3,>=2 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (2.0.4)\nRequirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (3.4)\nRequirement already satisfied: mpmath>=0.19 in /databricks/python3/lib/python3.10/site-packages (from sympy->torch) (1.2.1)\nInstalling collected packages: triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, transformers, torch, accelerate\n Attempting uninstall: triton\n Found existing installation: triton 2.0.0\n Not uninstalling triton at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-c21882be-297c-4406-a663-9b859108fc42\n Can't uninstall 'triton'. No files were found to uninstall.\n Attempting uninstall: transformers\n Found existing installation: transformers 4.34.0\n Not uninstalling transformers at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-c21882be-297c-4406-a663-9b859108fc42\n Can't uninstall 'transformers'. No files were found to uninstall.\n Attempting uninstall: torch\n Found existing installation: torch 2.0.1+cu118\n Not uninstalling torch at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-c21882be-297c-4406-a663-9b859108fc42\n Can't uninstall 'torch'. No files were found to uninstall.\n Attempting uninstall: accelerate\n Found existing installation: accelerate 0.23.0\n Not uninstalling accelerate at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-c21882be-297c-4406-a663-9b859108fc42\n Can't uninstall 'accelerate'. No files were found to uninstall.\nERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\ntorchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 2.1.1 which is incompatible.\nSuccessfully installed accelerate-0.25.0 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.3.101 nvidia-nvtx-cu12-12.1.105 torch-2.1.1 transformers-4.35.2 triton-2.1.0\n\u001B[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.\u001B[0m\n" + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 1.8 MB/s eta 0:00:00\n", + "Collecting nvidia-cusolver-cu12==11.4.5.107\n", + " Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 19.2 MB/s eta 0:00:00\n", + "Collecting triton==2.1.0\n", + " Downloading triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.2/89.2 MB 26.7 MB/s eta 0:00:00\n", + "Requirement already satisfied: typing-extensions in /databricks/python3/lib/python3.10/site-packages (from torch) (4.4.0)\n", + "Collecting nvidia-nvtx-cu12==12.1.105\n", + " Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 22.2 MB/s eta 0:00:00\n", + "Collecting nvidia-cuda-cupti-cu12==12.1.105\n", + " Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 91.7 MB/s eta 0:00:00\n", + "Requirement already satisfied: sympy in /databricks/python3/lib/python3.10/site-packages (from torch) (1.11.1)\n", + "Collecting nvidia-cufft-cu12==11.0.2.54\n", + " Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 20.4 MB/s eta 0:00:00\n", + "Collecting nvidia-cusparse-cu12==12.1.0.106\n", + " Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 11.9 MB/s eta 0:00:00\n", + "Collecting nvidia-cublas-cu12==12.1.3.1\n", + " Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 3.6 MB/s eta 0:00:00\n", + "Collecting nvidia-cuda-runtime-cu12==12.1.105\n", + " Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 62.9 MB/s eta 0:00:00\n", + "Collecting nvidia-nvjitlink-cu12\n", + " Downloading nvidia_nvjitlink_cu12-12.3.101-py3-none-manylinux1_x86_64.whl (20.5 MB)\n", + " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.5/20.5 MB 83.3 MB/s eta 0:00:00\n", + "Requirement already satisfied: psutil in /databricks/python3/lib/python3.10/site-packages (from accelerate) (5.9.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /databricks/python3/lib/python3.10/site-packages (from jinja2->torch) (2.1.1)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (2022.12.7)\n", + "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (1.26.14)\n", + "Requirement already satisfied: charset-normalizer<3,>=2 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (2.0.4)\n", + "Requirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (3.4)\n", + "Requirement already satisfied: mpmath>=0.19 in /databricks/python3/lib/python3.10/site-packages (from sympy->torch) (1.2.1)\n", + "Installing collected packages: triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, transformers, torch, accelerate\n", + " Attempting uninstall: triton\n", + " Found existing installation: triton 2.0.0\n", + " Not uninstalling triton at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-c21882be-297c-4406-a663-9b859108fc42\n", + " Can't uninstall 'triton'. No files were found to uninstall.\n", + " Attempting uninstall: transformers\n", + " Found existing installation: transformers 4.34.0\n", + " Not uninstalling transformers at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-c21882be-297c-4406-a663-9b859108fc42\n", + " Can't uninstall 'transformers'. No files were found to uninstall.\n", + " Attempting uninstall: torch\n", + " Found existing installation: torch 2.0.1+cu118\n", + " Not uninstalling torch at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-c21882be-297c-4406-a663-9b859108fc42\n", + " Can't uninstall 'torch'. No files were found to uninstall.\n", + " Attempting uninstall: accelerate\n", + " Found existing installation: accelerate 0.23.0\n", + " Not uninstalling accelerate at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-c21882be-297c-4406-a663-9b859108fc42\n", + " Can't uninstall 'accelerate'. No files were found to uninstall.\n", + "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "torchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 2.1.1 which is incompatible.\n", + "Successfully installed accelerate-0.25.0 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.3.101 nvidia-nvtx-cu12-12.1.105 torch-2.1.1 transformers-4.35.2 triton-2.1.0\n", + "\u001b[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.\u001b[0m\n" ] } ], @@ -89,7 +183,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -104,15 +198,17 @@ }, "outputs": [ { - "output_type": "stream", "name": "stderr", "output_type": "stream", "text": [ - "2023-12-03 18:33:15.973761: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n2023-12-03 18:33:15.973826: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n2023-12-03 18:33:15.973849: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n2023-12-03 18:33:15.981082: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\nTo enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" + "2023-12-03 18:33:15.973761: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", + "2023-12-03 18:33:15.973826: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", + "2023-12-03 18:33:15.973849: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2023-12-03 18:33:15.981082: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", + "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] }, { - "output_type": "stream", "name": "stdout", "output_type": "stream", "text": [ @@ -178,7 +274,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -193,7 +289,6 @@ }, "outputs": [ { - "output_type": "stream", "name": "stdout", "output_type": "stream", "text": [ @@ -239,7 +334,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -254,7 +349,6 @@ }, "outputs": [ { - "output_type": "display_data", "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "850085a74f5b4cb0bd2db5bff0be35e0", @@ -278,7 +372,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -293,23 +387,36 @@ }, "outputs": [ { - "output_type": "stream", "name": "stderr", "output_type": "stream", "text": [ - "/databricks/python_shell/dbruntime/huggingface_patches/datasets.py:27: UserWarning: This dataset can not be stored in DBFS because either `cache_dir` or the environment variable `HF_DATASETS_CACHE` is set to a non-DBFS path. If this cluster restarts, all saved dataset information will be lost.\n warnings.warn(\n/databricks/python_shell/dbruntime/huggingface_patches/datasets.py:13: UserWarning: During large dataset downloads, there could be multiple progress bar widgets that can cause performance issues for your notebook or browser. To avoid these issues, use `datasets.utils.logging.disable_progress_bar()` to turn off the progress bars.\n warnings.warn(\n" + "/databricks/python_shell/dbruntime/huggingface_patches/datasets.py:27: UserWarning: This dataset can not be stored in DBFS because either `cache_dir` or the environment variable `HF_DATASETS_CACHE` is set to a non-DBFS path. If this cluster restarts, all saved dataset information will be lost.\n", + " warnings.warn(\n", + "/databricks/python_shell/dbruntime/huggingface_patches/datasets.py:13: UserWarning: During large dataset downloads, there could be multiple progress bar widgets that can cause performance issues for your notebook or browser. To avoid these issues, use `datasets.utils.logging.disable_progress_bar()` to turn off the progress bars.\n", + " warnings.warn(\n" ] }, { - "output_type": "stream", "name": "stdout", "output_type": "stream", "text": [ - "Rust 136961\nRuby 136824\nJavaScript 131014\nJulia 129402\nPython 129063\nTypeScript 128653\nGo 126016\nC# 125478\nJava 123994\nBash 122804\nC++ 120813\nNeo4j database and Cypher 117589\nrelation database and SQL 103698\nName: programming_language, dtype: int64\n" + "Rust 136961\n", + "Ruby 136824\n", + "JavaScript 131014\n", + "Julia 129402\n", + "Python 129063\n", + "TypeScript 128653\n", + "Go 126016\n", + "C# 125478\n", + "Java 123994\n", + "Bash 122804\n", + "C++ 120813\n", + "Neo4j database and Cypher 117589\n", + "relation database and SQL 103698\n", + "Name: programming_language, dtype: int64\n" ] }, { - "output_type": "display_data", "data": { "image/png": "", "text/plain": [ @@ -361,7 +468,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -376,7 +483,6 @@ }, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "{'prompt': 'Develop a C# program snippet to Update Low Online Shopping: Product Availability for Analysis for Experts. Incorporate if/else or switch/case statements to handle various cases related to the Privacy. Dry-run, ensure your control flow logic is clear and well-commented.',\n", @@ -421,7 +527,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -451,7 +557,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -466,7 +572,6 @@ }, "outputs": [ { - "output_type": "execute_result", "data": { "text/html": [ "
\n", @@ -623,7 +728,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -638,7 +743,6 @@ }, "outputs": [ { - "output_type": "execute_result", "data": { "text/html": [ "
\n", @@ -777,7 +881,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -836,7 +940,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -851,7 +955,6 @@ }, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "DatasetDict({\n", @@ -882,7 +985,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -897,7 +1000,6 @@ }, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "Rust 91672\n", @@ -942,7 +1044,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -957,15 +1059,17 @@ }, "outputs": [ { - "output_type": "stream", "name": "stderr", "output_type": "stream", "text": [ - "2023-12-03 18:48:53.085806: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n2023-12-03 18:48:53.085867: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n2023-12-03 18:48:53.085905: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n2023-12-03 18:48:53.092029: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\nTo enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" + "2023-12-03 18:48:53.085806: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", + "2023-12-03 18:48:53.085867: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", + "2023-12-03 18:48:53.085905: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "2023-12-03 18:48:53.092029: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", + "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] }, { - "output_type": "display_data", "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "03c00924cb974e1eabe0b431f1094e8f", @@ -980,7 +1084,6 @@ "output_type": "display_data" }, { - "output_type": "display_data", "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "282ce9542d914d07a9e860dd0bc1ca6f", @@ -995,7 +1098,6 @@ "output_type": "display_data" }, { - "output_type": "display_data", "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5da61a3e71f541e5ab829a7f995c5a45", @@ -1047,7 +1149,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -1089,7 +1191,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -1104,7 +1206,6 @@ }, "outputs": [ { - "output_type": "stream", "name": "stdout", "output_type": "stream", "text": [ @@ -1112,7 +1213,6 @@ ] }, { - "output_type": "display_data", "data": { "text/html": [ "\n", @@ -1210,7 +1310,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -1225,7 +1325,6 @@ }, "outputs": [ { - "output_type": "execute_result", "data": { "text/plain": [ "7274" @@ -1267,7 +1366,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -1282,11 +1381,27 @@ }, "outputs": [ { - "output_type": "stream", "name": "stdout", "output_type": "stream", "text": [ - "Input: question: what programming language is this? code: def add_a_b(a, b): return a + b\nOutput: Python\n\nInput: question: what programming language is this? code: public class HelloWorld { public static void Main() { Console.WriteLine(\"Hello, World!\"); } }\nOutput: C#\n\nInput: question: what programming language is this? code: #include int main() { std::cout << \"Hello, World!\" << std::endl; return 0; }\nOutput: Rust\n\nInput: question: what programming language is this? code: println(\"Hello, World!\")\nOutput: Julia\n\nInput: question: what programming language is this? code: echo \"Hello, World!\"\nOutput: Bash\n\nInput: question: what programming language is this? code: fn main() { println!(\"Hello, World!\"); }\nOutput: Rust\n\n" + "Input: question: what programming language is this? code: def add_a_b(a, b): return a + b\n", + "Output: Python\n", + "\n", + "Input: question: what programming language is this? code: public class HelloWorld { public static void Main() { Console.WriteLine(\"Hello, World!\"); } }\n", + "Output: C#\n", + "\n", + "Input: question: what programming language is this? code: #include int main() { std::cout << \"Hello, World!\" << std::endl; return 0; }\n", + "Output: Rust\n", + "\n", + "Input: question: what programming language is this? code: println(\"Hello, World!\")\n", + "Output: Julia\n", + "\n", + "Input: question: what programming language is this? code: echo \"Hello, World!\"\n", + "Output: Bash\n", + "\n", + "Input: question: what programming language is this? code: fn main() { println!(\"Hello, World!\"); }\n", + "Output: Rust\n", + "\n" ] } ], @@ -1341,7 +1456,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": { @@ -1356,7 +1471,6 @@ }, "outputs": [ { - "output_type": "stream", "name": "stdout", "output_type": "stream", "text": [ @@ -1436,7 +1550,7 @@ }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, @@ -1461,9 +1575,9 @@ "widgets": {} }, "kernelspec": { - "display_name": "env", + "display_name": "Python (mlops)", "language": "python", - "name": "python3" + "name": "mlops" }, "language_info": { "codemirror_mode": { @@ -1475,7 +1589,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.6" + "version": "3.11.4" } }, "nbformat": 4, diff --git a/2_gpt2_single_gpu/2. GPT2 on a single GPU.ipynb b/2_gpt2_single_gpu/2. GPT2 on a single GPU.ipynb index 538ab76..702f94a 100644 --- a/2_gpt2_single_gpu/2. GPT2 on a single GPU.ipynb +++ b/2_gpt2_single_gpu/2. GPT2 on a single GPU.ipynb @@ -20,12 +20,12 @@ "- Input: `The boy hid behind`, Target: `the`\n", "- Input: `The boy hid behind the`, Target: `tree`.\n", "\n", - "This requires us to preprocess our data and pass it along to the model somewhat differently, which will be the subject of this notebook. We will still limit this example to training on a single GPU (an a10 with 24GB VRAM). We will use the [gpt2](https://huggingface.co/gpt2) model with 124M parameters. Recalling the heuristic that training requires VRAM in GB ≥ 16 times the number of parameters in billions, we require .124 * 16 ≅ 2GB VRAM, so we should not be VRAM constrained." + "This requires us to preprocess our data and pass it along to the model somewhat differently, which will be the subject of this notebook. We will still limit this example to training on a single GPU (an a10 with 24GB VRAM). We will use the [gpt2](https://huggingface.co/gpt2) model with 124M parameters. Later, we will work though Eleuther's [Transformer Math blog post](https://blog.eleuther.ai/transformer-math/#training) to understand the memory costs associated with training this model under different conditions and verify that it matches our experience." ] }, { "cell_type": "code", - "execution_count": 0, + "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "cellMetadata": {}, @@ -48,6 +48,15 @@ }, "notebookName": "2. GPT2 on a single GPU", "widgets": {} + }, + "kernelspec": { + "display_name": "myenv", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10.11" } }, "nbformat": 4,