Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge from main #52

Merged
merged 73 commits into from
Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
fbd2576
Merge pull request #30 from arjbingly/project-BasicRAG
arjbingly Mar 16, 2024
014e697
Update prompts and embedding test
arjbingly Mar 17, 2024
fb1511c
chroma test
arjbingly Mar 18, 2024
169b2e9
Ruff config
arjbingly Mar 18, 2024
0ed01ea
LLM test
sanchitvj Mar 18, 2024
4b3a281
Ruff format
sanchitvj Mar 18, 2024
fdc6325
Create ruff_linting workflow
arjbingly Mar 19, 2024
44571be
debug chroma tests
arjbingly Mar 19, 2024
8e51732
ruff lint docstrings
arjbingly Mar 19, 2024
75b479e
separate pipes in llm test
arjbingly Mar 19, 2024
9e32fe5
add readme with valid tests
arjbingly Mar 19, 2024
31c3daf
Ruff docstring changes
arjbingly Mar 19, 2024
e040108
Merge branch 'main' into update-docs
arjbingly Mar 19, 2024
b05818f
Modified LLM tests
sanchitvj Mar 19, 2024
255112c
all test cases passed
sanchitvj Mar 19, 2024
c8ebb94
Restructure tests
arjbingly Mar 19, 2024
2e09499
Rename chroma_test to chroma_client_test
arjbingly Mar 19, 2024
43dd7b8
Remove not relevant tests
arjbingly Mar 19, 2024
7552ae8
All tests passed for basic RAG
sanchitvj Mar 19, 2024
0d9db06
Merge pull request #32 from arjbingly/project-BasicRAG
arjbingly Mar 19, 2024
02cb36f
resolved PR conflicts
sanchitvj Mar 19, 2024
d7059c6
Merge pull request #34 from arjbingly/tests
arjbingly Mar 19, 2024
7ab4c3f
Rename ruff_linting to ruff_linting.yml
sanchitvj Mar 19, 2024
a3e74de
Merge pull request #35 from arjbingly/runner
arjbingly Mar 19, 2024
dccd77b
Update ruff_linting.yml
sanchitvj Mar 19, 2024
d116ea2
Merge pull request #36 from arjbingly/sanchitvj-patch-1
arjbingly Mar 19, 2024
2d7c879
Create ruff_commit.yml
arjbingly Mar 19, 2024
4194cdd
Rename main.yml to ruff_commit.yml
arjbingly Mar 19, 2024
fb61922
Update ruff_commit.yml
sanchitvj Mar 19, 2024
e91ea84
Merge pull request #38 from arjbingly/arjbingly-patch-1
arjbingly Mar 19, 2024
0559d19
Update ruff_commit.yml
sanchitvj Mar 19, 2024
80bcc68
Update ruff_commit.yml
sanchitvj Mar 20, 2024
d749618
Update ruff_commit.yml
sanchitvj Mar 20, 2024
b378f56
style fixes by ruff
sanchitvj Mar 20, 2024
c07329c
Merge pull request #39 from arjbingly/sanchitvj-patch-2
arjbingly Mar 20, 2024
8b6ab3f
Update ruff_linting.yml
sanchitvj Mar 20, 2024
7fd62c9
Merge pull request #40 from arjbingly/sanchitvj-patch-1
arjbingly Mar 20, 2024
f4719fa
ruff format
arjbingly Mar 20, 2024
da5b5f3
Merge pull request #42 from arjbingly/ruff-reformat
arjbingly Mar 21, 2024
c151a50
Refactor attributes from multivec_retriever for consistency.
arjbingly Mar 21, 2024
5e72ad9
DeepLake client, vectordb
arjbingly Mar 21, 2024
379f21c
Merge pull request #43 from arjbingly/main
arjbingly Mar 21, 2024
820702f
Remove old chroma_client
arjbingly Mar 21, 2024
7729d32
Bug fix: top_k
arjbingly Mar 21, 2024
2f05d98
Update chroma_client_test
arjbingly Mar 21, 2024
41b2bcf
Deeplake tests, typing
arjbingly Mar 22, 2024
428c634
quantization
sanchitvj Mar 22, 2024
698efbd
Update to remove ruff errors
arjbingly Mar 22, 2024
acd4ba2
Ruff bugs
arjbingly Mar 22, 2024
0160117
style fixes by ruff
arjbingly Mar 22, 2024
aec7377
style fixes by ruff
sanchitvj Mar 22, 2024
2df28d1
style fixes by ruff
arjbingly Mar 22, 2024
cf0992a
Merge pull request #44 from arjbingly/update-docs
sanchitvj Mar 22, 2024
00e2d6b
Update embedding docstring
arjbingly Mar 23, 2024
26235f5
Update doc strings.
arjbingly Mar 23, 2024
58be4d8
Merge pull request #45 from arjbingly/update-docs
sanchitvj Mar 23, 2024
8e78f75
quantize file
sanchitvj Mar 23, 2024
79ebf3a
Merge branch 'quantize' of https://github.com/arjbingly/Capstone_5 in…
sanchitvj Mar 23, 2024
11697c0
Revert "Merge branch 'quantize' of https://github.com/arjbingly/Capst…
sanchitvj Mar 23, 2024
1bb1216
rectified quantization, issue with llama.cpp
sanchitvj Mar 24, 2024
a7354ee
issue in llama.cpp
sanchitvj Mar 24, 2024
caebf0a
Config changes for deeplake
arjbingly Mar 24, 2024
66c06d0
modifications and corrections after testing
sanchitvj Mar 24, 2024
f94114e
Retriever update
arjbingly Mar 24, 2024
b90a882
quantizations all tests passed
sanchitvj Mar 24, 2024
7a7d5a7
Merge branch 'main' into vectordb
arjbingly Mar 24, 2024
14ca30d
style fixes by ruff
sanchitvj Mar 24, 2024
454bb5d
style fixes by ruff
arjbingly Mar 24, 2024
56f16cf
Merge pull request #47 from arjbingly/vectordb
sanchitvj Mar 24, 2024
d161553
Merge branch 'main' into quantize
sanchitvj Mar 24, 2024
fd0e374
Merge pull request #48 from arjbingly/quantize
sanchitvj Mar 24, 2024
685403f
Create LICENSE
arjbingly Mar 25, 2024
046ed0e
Merge pull request #51 from arjbingly/LICENSE
arjbingly Mar 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .github/workflows/ruff_commit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: Ruff and commit
on: push

jobs:
lint:
runs-on: self-hosted
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- run: pip install ruff
# - run: ruff check src/
- run: ruff format src/
- uses: stefanzweifel/git-auto-commit-action@v4
with:
commit_message: 'style fixes by ruff'
25 changes: 25 additions & 0 deletions .github/workflows/ruff_linting.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Ruff Linting
on:
pull_request:
branches:
- main

jobs:
adopt-ruff:
runs-on: self-hosted
steps:
- name: Check out repository code
uses: actions/checkout@v4

- name: Set up python
id: setup-python
uses: actions/setup-python@v5
with:
python-version: 3.x

- name: Install ruff
run: pip install ruff

- name: Run the adopt-ruff action
uses: chartboost/ruff-action@v1

661 changes: 661 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions llm_quantize/quantize.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import os
import subprocess
import sys
import os


def execute_commands(model_dir_path, quantization=None):
Expand All @@ -13,7 +13,7 @@ def execute_commands(model_dir_path, quantization=None):
if quantization:
model_file = f"llama.cpp/models/{model_dir_path}/ggml-model-f16.gguf"
quantized_model_file = f"llama.cpp/models/{model_dir_path.split('/')[-1]}/ggml-model-{quantization}.gguf"
subprocess.run(["llama.cpp/llm_quantize", model_file, quantized_model_file, quantization], check=True)
subprocess.run(["llama.cpp/quantize", model_file, quantized_model_file, quantization], check=True)

else:
print("llama.cpp doesn't exist, check readme how to clone.")
Expand Down
8 changes: 6 additions & 2 deletions projects/Basic-RAG/BasicRAG_stuff.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
from grag.grag.rag import BasicRAG
from grag.components.multivec_retriever import Retriever
from grag.components.vectordb.deeplake_client import DeepLakeClient
from grag.rag.basic_rag import BasicRAG

rag = BasicRAG(doc_chain="stuff")
client = DeepLakeClient(collection_name="test")
retriever = Retriever(vectordb=client)
rag = BasicRAG(doc_chain="stuff", retriever=retriever)

if __name__ == "__main__":
while True:
Expand Down
14 changes: 7 additions & 7 deletions projects/Retriver-GUI/retriever_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def render_search_results(self):
st.write(result.metadata)

def check_connection(self):
response = self.app.retriever.client.test_connection()
response = self.app.retriever.vectordb.test_connection()
if response:
return True
else:
Expand All @@ -55,14 +55,14 @@ def check_connection(self):
def render_stats(self):
st.write(f'''
**Chroma Client Details:** \n
Host Address : {self.app.retriever.client.host}:{self.app.retriever.client.port} \n
Collection Name : {self.app.retriever.client.collection_name} \n
Embeddings Type : {self.app.retriever.client.embedding_type} \n
Embeddings Model: {self.app.retriever.client.embedding_model} \n
Number of docs : {self.app.retriever.client.collection.count()} \n
Host Address : {self.app.retriever.vectordb.host}:{self.app.retriever.vectordb.port} \n
Collection Name : {self.app.retriever.vectordb.collection_name} \n
Embeddings Type : {self.app.retriever.vectordb.embedding_type} \n
Embeddings Model: {self.app.retriever.vectordb.embedding_model} \n
Number of docs : {self.app.retriever.vectordb.collection.count()} \n
''')
if st.button('Check Connection'):
response = self.app.retriever.client.test_connection()
response = self.app.retriever.vectordb.test_connection()
if response:
st.write(':green[Connection Active]')
else:
Expand Down
20 changes: 20 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ dependencies = [
"huggingface_hub>=0.20.2",
"pydantic>=2.5.0",
"rouge-score>=0.1.2",
"deeplake>=3.8.27"
]

[project.urls]
Expand Down Expand Up @@ -97,3 +98,22 @@ exclude_lines = [
"if __name__ == .__main__.:",
"if TYPE_CHECKING:",
]

[tool.ruff]
line-length = 88
indent-width = 4
extend-exclude = ["tests", "others"]

[tool.ruff.lint]
select = ["E4", "E7", "E9", "F", "I", "D"]
ignore = ["D104"]
exclude = ["__about__.py"]


[tool.ruff.format]
quote-style = "double"
indent-style = "space"
docstring-code-format = true

[tool.ruff.lint.pydocstyle]
convention = "google"
Empty file added src/__init__.py
Empty file.
24 changes: 21 additions & 3 deletions src/config.ini
Original file line number Diff line number Diff line change
@@ -1,18 +1,25 @@
[llm]
model_name : Llama-2-13b-chat
model_name : Llama-2-7b-chat
# meta-llama/Llama-2-70b-chat-hf Mixtral-8x7B-Instruct-v0.1
quantization : Q5_K_M
pipeline : llama_cpp
device_map : auto
task : text-generation
max_new_tokens : 1024
temperature : 0.1
n_batch_gpu_cpp : 1024
n_ctx_cpp : 6000
n_gpu_layers_cpp : 18
n_gpu_layers_cpp : -1
# The number of layers to put on the GPU. Mixtral-18
std_out : True
base_dir : ${root:root_path}/models

[deeplake]
collection_name : arxiv
embedding_type : instructor-embedding
embedding_model : hkunlp/instructor-xl
store_path : ${data:data_path}/vectordb

[chroma]
host : localhost
port : 8000
Expand All @@ -24,6 +31,14 @@ embedding_model : hkunlp/instructor-xl
store_path : ${data:data_path}/vectordb
allow_reset : True

[deeplake]
collection_name : arxiv
# embedding_type : sentence-transformers
# embedding_model : "all-mpnet-base-v2"
embedding_type : instructor-embedding
embedding_model : hkunlp/instructor-xl
store_path : ${data:data_path}/vectordb

[text_splitter]
chunk_size : 5000
chunk_overlap : 400
Expand All @@ -50,4 +65,7 @@ table_as_html : True
data_path : ${root:root_path}/data

[root]
root_path : /home/ubuntu/volume_2k/Capstone_5
root_path : /home/ubuntu/volume_2k/Capstone_5

[quantize]
llama_cpp_path : ${root:root_path}
124 changes: 0 additions & 124 deletions src/grag/components/chroma_client.py

This file was deleted.

31 changes: 22 additions & 9 deletions src/grag/components/embedding.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
"""Class for embedding.

This module provides:
- Embedding
"""

from langchain_community.embeddings import HuggingFaceInstructEmbeddings
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.embeddings.sentence_transformer import (
SentenceTransformerEmbeddings,
)


class Embedding:
"""
A class for vector embeddings.
"""A class for vector embeddings.

Supports:
huggingface sentence transformers -> model_type = 'sentence-transformers'
huggingface instructor embeddings -> model_type = 'instructor-embedding'
Expand All @@ -16,14 +24,19 @@ class Embedding:
"""

def __init__(self, embedding_type: str, embedding_model: str):
"""Initialize the embedding with embedding_type and embedding_model."""
self.embedding_type = embedding_type
self.embedding_model = embedding_model
match self.embedding_type:
case 'sentence-transformers':
self.embedding_function = SentenceTransformerEmbeddings(model_name=self.embedding_model)
case 'instructor-embedding':
self.embedding_instruction = 'Represent the document for retrival'
self.embedding_function = HuggingFaceInstructEmbeddings(model_name=self.embedding_model)
case "sentence-transformers":
self.embedding_function = SentenceTransformerEmbeddings(
model_name=self.embedding_model
)
case "instructor-embedding":
self.embedding_instruction = "Represent the document for retrival"
self.embedding_function = HuggingFaceInstructEmbeddings(
model_name=self.embedding_model
)
self.embedding_function.embed_instruction = self.embedding_instruction
case _:
raise Exception('embedding_type is invalid')
raise Exception("embedding_type is invalid")
Loading
Loading