Llama-3.2-3B-on-colab

Running Llama 3.2 3B 4-bit quantized model (2.04 GB) on Google Colab T4 GPU (free)

Purpose: Lightweight (2.24 GB) model, designed for Google Colab (or) local resource constraint environments.

The 3B model performs better than current SOTA models (Gemma 2 2B, Phi 3.5 Mini, Qwen 2.5 1B & 3B Models, tested with huggingface serverless inference)

Automatic Setup: Detects the environment, downloads the model from Hugging Face if needed, and saves it locally or in Google Drive.
Interactive Prompting: Allows streamable responses, thus outputs are generated as word-to-word as it gets processed.
Model Management: Automatically saves the tokenizer and model for future reuse to avoid redundant downloads.
Usage: Run all cells, input a query when prompted, and view the response directly in the console.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Llama_3_2_3B_4bit_quantised_RUN_FROM_COLAB_DIRECTLY.ipynb		Llama_3_2_3B_4bit_quantised_RUN_FROM_COLAB_DIRECTLY.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama-3.2-3B-on-colab

About

Releases

Packages

Languages

g-wtham/Llama-3.2-3B-on-colab

Folders and files

Latest commit

History

Repository files navigation

Llama-3.2-3B-on-colab

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages