Retrieval-Augmented Generation (RAG) has emerged as a robust technique for enhancing the knowledge base of Language Learning Models (LLMs). By retrieving documents from local or external sources, RAG enables models to provide relevant, up-to-date, and grounded responses through in-context learning.
This repository showcases how to build a fully local RAG system from scratch using Ollama and LangChain. Our focus is on privacy and performance, as all components—language models, embeddings, and retrieval—are processed locally, eliminating reliance on external cloud services.
-
Local Language Model: Utilizing Ollama, we run llama models locally on your machine. This setup avoids API calls and cloud services, keeping your data private while providing high-quality language generation.
-
Local Embeddings: We generate document embeddings using Hugging Face's
sentence-transformers
, all performed locally, ensuring that your data never leaves your environment. -
Local Retrieval (FAISS): To retrieve relevant documents, we employ Facebook AI Similarity Search (FAISS), a high-performance vector store that facilitates efficient similarity searches. This ensures the retrieved documents are used to ground the llama model's responses.
We offer a series of Google Colab tutorials that walk you through building a Local RAG system with LangChain and Ollama. These tutorials cater to both CPU and GPU setups, allowing you to leverage the resources available on your machine.
I have shared the folder containing my Google Colab notebooks: