Merge pull request #95 from arjbingly/sanchitvj-patch-1

updated main readme
arjbingly · Apr 23, 2024 · 663eaf3 · 663eaf3
2 parents f44c0ba + 73947cb
commit 663eaf3
Showing 1 changed file with 7 additions and 227 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
-# GRAG (note: specify the abbreviation)
+# GRAG - Good RAG
 
 ![GitHub License](https://img.shields.io/github/license/arjbingly/Capstone_5)
-![Linting](https://img.shields.io/github/actions/workflow/status/arjbingly/Capstone_5/sphinx-gitpg.yml?label=Docs&labelColor=yellow)
+![Linting](https://img.shields.io/github/actions/workflow/status/arjbingly/Capstone_5/sphinx-gitpg.yml?label=Docs)
 ![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/arjbingly/Capstone_5/build_linting.yml?label=Linting)
 ![Static Badge](https://img.shields.io/badge/Tests-failing-red)
 ![Static Badge](https://img.shields.io/badge/docstring%20style-google-yellow)
@@ -10,7 +10,10 @@
 ![Static Badge](https://img.shields.io/badge/codestyle-pyflake-purple?labelColor=white)
 ![GitHub Issues or Pull Requests](https://img.shields.io/github/issues-pr/arjbingly/Capstone_5)
 
-(note: add overview on what the purpose of this project is here. Talk briefly about RAG. Maybe copy from the proposal)
+
+[GRAG](https://arjbingly.github.io/Capstone_5/) is a simple python package that provides an easy end-to-end solution for implementing Retrieval Augmented Generation (RAG).
+
+The package offers an easy way for running various LLMs locally, Thanks to LlamaCpp and also supports vector stores like Chroma and DeepLake. It also makes it easy to integrage support to any vector stores easy.
 
 <figure>
     <img src="documentation/basic_RAG_pipeline.png" alt="Diagram of a basic RAG pipeline">
@@ -24,19 +27,7 @@
 - [Getting Started](#getting-started)
   - [Requirements](#requirements)
   - [LLM Models](#llm-models)
-  - [Data](#data)
   - [Supported Vector Databases](#supported-vector-databases)
-    - [Embeddings](#embeddings)
-  - [Data Ingestion](#data-ingestion)
-- [Main Features](#main-features)
-  - [1. PDF Parser](#1-pdf-parser)
-  - [2. Multi-Vector Retriever](#2-multi-vector-retriever)
-  - [3. BasicRAG](#3-basicrag)
-- [GUI](#gui)
-  - [1. Retriever GUI](#1-retriever-gui)
-  - [2. BasicRAG GUI](#2-basicrag-gui)
-- [Demo](#demo)
-- [Repo Structure](#repo-structure)
 
 ## Project Overview
 
@@ -61,8 +52,6 @@ Further customization can be made on the config file, `src/config.ini`.
 
 Required packages to install includes (_refer to [pyproject.toml](pyproject.toml)_):
 
-Required packages to install includes (_refer to [pyproject.toml](pyproject.toml)_):
-
 - PyTorch
 - LangChain
 - Chroma
@@ -86,18 +75,6 @@ For more details, go to [.\llm_quantize\readme.md](.\llm_quantize\readme.md)
 
 Refer to [llama.cpp](https://github.com/ggerganov/llama.cpp) Supported Models (under Description) for list of compatible models.
 
-### Data
-
-Any PDF can be used for this project. We personally tested the project using ArXiv papers. Refer [ArXiv Bulk Data](https://info.arxiv.org/help/bulk_data/index.html) for
-details on how to download.
-
-```
-├── data
-│   ├── pdf
-```
-
-**Make sure to specify `data_path` under `data` in `src/config.ini`**
-
 ### Supported Vector Databases
 
 **1. [Chroma](https://www.trychroma.com)**
@@ -109,202 +86,5 @@ Since Chroma is a server-client based vector database, make sure to run the serv
 
 **2. [Deeplake](https://www.deeplake.ai/)**
 
-#### Embeddings
-
-- By default, the embedding model is `instructor-xl`. Can be changed by changing `embedding_type` and `embedding_model`
-  in `src/config.ini'. Any huggingface embeddings can be used.
-
-### Data Ingestion
-
-For ingesting data to the vector db:
-
-```
-client = DeepLakeClient() # Any vectordb client
-retriever = Retriever(vectordb=client)
-
-
-dir_path = Path(__file__).parents[2] # path to folder containing pdf files
-
-
-retriever.ingest(dir_path)
-```
-
-Refer to ['cookbook/basicRAG/BasicRAG_ingest'](./cookbook/basicRAG/BasicRAG_ingest)
-
----
-
-## Main Features
-
-### 1. PDF Parser
-
-(note: need to rewrite this. Under contruction: test suites and documentation for every iteration)
 
-- The pdf parser is implemented using [Unstructured.io](https://unstructured.io).
-- It effectively parses any pdf including OCR documents and categorises all elements including tables and images.
-- Enables contextual text parsing: it ensures that the chunking process does not separate items like list items, and keeps titles together with text.
-- Tables are not chunked.
-
-### 2. Multi-Vector Retriever
-
-- It easily retrieves not only the most similar chunks (to a query) but also the source document of the chunks.
-
-### 3. BasicRAG
-
-Refer to [BasicRAG/README.md](./cookbook/Basic-RAG/README.md)
-(note: fix the RAGPipeline.md link)
-
----
-
-## GUI
-
-### 1. Retriever GUI
-
-A simple GUI for retrieving documents and viewing config of the vector database.
-
-To run: `streamlit run projects/retriver_app.py -server.port=8888`
-
-### 2. BasicRAG GUI
-
-Under development.
-
----
-
-## Demo
-
-(to be added)
-![Watch the video](../Sample_Capstone/demo/fig/demo.gif)
-
-## Repo Structure
-
----
-
-```
-.
-├── LICENSE
-├── README.md
-├── ci
-│   ├── Jenkinsfile
-│   ├── env_test.py
-│   ├── modify_config.py
-│   └── unlock_deeplake.py
-├── cookbook
-│   ├── Basic-RAG
-│   │   ├── BasicRAG_CustomPrompt.py
-│   │   ├── BasicRAG_FewShotPrompt.py
-│   │   ├── BasicRAG_ingest.py
-│   │   ├── BasicRAG_refine.py
-│   │   ├── BasicRAG_stuff.py
-│   │   ├── RAG-PIPELINES.md
-│   │   └── README.md
-│   └── Retriver-GUI
-│       └── retriever_app.py
-├── demo
-│   ├── Readme.md
-│   └── fig
-│       ├── demo.gif
-│       └── video.mp4
-├── documentation
-│   ├── AWS_Setup_Nvidia_Driver_Install.md
-│   ├── AWS_Setup_Python_Env.md
-│   ├── Building an effective RAG app.md
-│   ├── Data Sources.md
-│   ├── basic_RAG_pipeline.drawio.svg
-│   └── challenges.md
-├── full_report
-│   ├── Latex_report
-│   │   ├── File_Setup.tex
-│   │   ├── Sample_Report.pdf
-│   │   ├── Sample_Report.tex
-│   │   ├── fig
-│   │   │   ├── GW_logo-eps-converted-to.pdf
-│   │   │   ├── GW_logo.eps
-│   │   │   ├── ascent-archi.pdf
-│   │   │   ├── certificates-log-archi.pdf
-│   │   │   ├── nyush-logo.jpeg
-│   │   │   └── perf-plot-1.pdf
-│   │   └── references.bib
-│   ├── Markdown_Report
-│   ├── Readme.md
-│   └── Word_Report
-│       ├── Sample_Report.docx
-│       └── Sample_Report.pdf
-├── llm_quantize
-│   └── README.md
-├── presentation
-│   └── Readme.md
-├── proposal
-│   └── proposal.md
-├── pyproject.toml
-├── requirements.yml
-├── research_paper
-│   ├── Latex
-│   │   ├── Fig
-│   │   │   ├── narxnet1-eps-converted-to.pdf
-│   │   │   └── narxnet1.eps
-│   │   ├── Paper_Temp.pdf
-│   │   ├── Paper_Temp.tex
-│   │   └── mybib.bib
-│   ├── Readme.md
-│   └── Word
-│       └── Conference-template-A4.doc
-└── src
-    ├── __init__.py
-    ├── config.ini
-    ├── grag
-    │   ├── __about__.py
-    │   ├── __init__.py
-    │   ├── components
-    │   │   ├── __init__.py
-    │   │   ├── embedding.py
-    │   │   ├── llm.py
-    │   │   ├── multivec_retriever.py
-    │   │   ├── parse_pdf.py
-    │   │   ├── prompt.py
-    │   │   ├── text_splitter.py
-    │   │   ├── utils.py
-    │   │   └── vectordb
-    │   │       ├── __init__.py
-    │   │       ├── base.py
-    │   │       ├── chroma_client.py
-    │   │       └── deeplake_client.py
-    │   ├── prompts
-    │   │   ├── Llama-2_QA-refine_1.json
-    │   │   ├── Llama-2_QA_1.json
-    │   │   ├── Mixtral_QA_1.json
-    │   │   ├── __init__.py
-    │   │   └── matcher.json
-    │   ├── quantize
-    │   │   ├── __init__.py
-    │   │   ├── quantize.py
-    │   │   └── utils.py
-    │   └── rag
-    │       ├── __init__.py
-    │       └── basic_rag.py
-    ├── scripts
-    │   ├── reset_chroma.sh
-    │   ├── reset_store.sh
-    │   └── run_chroma.sh
-    └── tests
-        ├── README.md
-        ├── __init__.py
-        ├── components
-        │   ├── __init__.py
-        │   ├── embedding_test.py
-        │   ├── llm_test.py
-        │   ├── multivec_retriever_test.py
-        │   ├── parse_pdf_test.py
-        │   ├── prompt_test.py
-        │   ├── utils_test.py
-        │   └── vectordb
-        │       ├── __init__.py
-        │       ├── chroma_client_test.py
-        │       └── deeplake_client_test.py
-        ├── quantize
-        │   ├── __init__.py
-        │   └── quantize_test.py
-        └── rag
-            ├── __init__.py
-            └── basic_rag_test.py
-```
-
----
+For more information refer to [Documentation](https://arjbingly.github.io/Capstone_5/).