diff --git a/DOCUMENT.md b/DOCUMENT.md index c99c2dd..76f5978 100644 --- a/DOCUMENT.md +++ b/DOCUMENT.md @@ -171,6 +171,30 @@ from llmlingua import PromptCompressor llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"}) ``` +### Integration with LangChain + +Thanks to the contributions of Ayo Ayibiowu (@thehapyone), (Long)LLMLingua can be seamlessly integrated into LangChain. Here's an example of how to initialize (Long)LLMLingua within LangChain: + +```python +from langchain.retrievers import ContextualCompressionRetriever +from langchain_community.retrievers.document_compressors import LLMLinguaCompressor +from langchain_openai import ChatOpenAI + +llm = ChatOpenAI(temperature=0) + +compressor = LLMLinguaCompressor(model_name="openai-community/gpt2", device_map="cpu") +compression_retriever = ContextualCompressionRetriever( + base_compressor=compressor, base_retriever=retriever +) + +compressed_docs = compression_retriever.get_relevant_documents( + "What did the president say about Ketanji Jackson Brown" +) +pretty_print_docs(compressed_docs) +``` + +For a more detailed guide, please refer to [Notebook](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/retrievers/llmlingua.ipynb). + ### Integration with LlamaIndex Thanks to the contributions of Jerry Liu (@jerryjliu), (Long)LLMLingua can be seamlessly integrated into LlamaIndex. Here's an example of how to initialize (Long)LLMLingua within LlamaIndex: diff --git a/README.md b/README.md index 656013f..c886e80 100644 --- a/README.md +++ b/README.md @@ -18,12 +18,12 @@ https://github.com/microsoft/LLMLingua/assets/30883354/eb0ea70d-6d4c-4aa7-8977-6 ## News +- 👾 LLMLingua has been integrated into [LangChain](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/retrievers/llmlingua.ipynb) and [LlamaIndex](https://github.com/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/LongLLMLingua.ipynb), two widely-used RAG frameworks. - 🤳 Talk slides are available in [AI Time Jan, 24](https://drive.google.com/file/d/1fzK3wOvy2boF7XzaYuq2bQ3jFeP1WMk3/view?usp=sharing). - 🖥 EMNLP'23 slides are available in [Session 5](https://drive.google.com/file/d/1GxQLAEN8bBB2yiEdQdW4UKoJzZc0es9t/view) and [BoF-6](https://drive.google.com/file/d/1LJBUfJrKxbpdkwo13SgPOqugk-UjLVIF/view). - 📚 Check out our new [blog post](https://medium.com/@iofu728/longllmlingua-bye-bye-to-middle-loss-and-save-on-your-rag-costs-via-prompt-compression-54b559b9ddf7) discussing RAG benefits and cost savings through prompt compression. See the script example [here](https://github.com/microsoft/LLMLingua/blob/main/examples/Retrieval.ipynb). - 🎈 Visit our [project page](https://llmlingua.com/) for real-world case studies in RAG, Online Meetings, CoT, and Code. - 👨‍🦯 Explore our ['./examples'](./examples) directory for practical applications, including [RAG](./examples/RAG.ipynb), [Online Meeting](./examples/OnlineMeeting.ipynb), [CoT](./examples/CoT.ipynb), [Code](./examples/Code.ipynb), and [RAG using LlamaIndex](./examples/RAGLlamaIndex.ipynb). -- 👾 LongLLMLingua is now part of the [LlamaIndex pipeline](https://github.com/run-llama/llama_index/blob/main/llama_index/postprocessor/longllmlingua.py), a widely-used RAG framework. ## TL;DR diff --git a/Transparency_FAQ.md b/Transparency_FAQ.md index b497f55..61f8f3b 100644 --- a/Transparency_FAQ.md +++ b/Transparency_FAQ.md @@ -127,13 +127,13 @@ We release the parameter in the [issue1](https://github.com/microsoft/LLMLingua/ **LLMLingua**: ```python -prompt = compressor.compress_prompt( - context=xxx, - instruction=xxx, - question=xxx, - ratio=0.75, - iterative_size=100, - context_budget="*2", +prompt = compressor.compress_prompt( + context=xxx, + instruction=xxx, + question=xxx, + ratio=0.75, + iterative_size=100, + context_budget="*2", ) ``` @@ -141,18 +141,70 @@ prompt = compressor.compress_prompt( ```python compressed_prompt = llm_lingua.compress_prompt( - demonstration.split("\n"), - instruction, - question, - 0.55, - use_sentence_level_filter=False, - condition_in_question="after_condition", - reorder_context="sort", + demonstration.split("\n"), + instruction, + question, + 0.55, + use_sentence_level_filter=False, + condition_in_question="after_condition", + reorder_context="sort", dynamic_context_compression_ratio=0.3, # or 0.4 - condition_compare=True, - context_budget="+100", + condition_compare=True, + context_budget="+100", rank_method="longllmlingua", ) ``` -Experiments in LLMLingua and most experiments in LongLLMLingua were conducted in completion mode, whereas chat mode tends to be more sensitive to token-level compression. However, OpenAI has currently disabled GPT-3.5-turbo's completion; you can use GPT-3.5-turbo-instruction or Azure OpenAI service instead. \ No newline at end of file +Experiments in LLMLingua and most experiments in LongLLMLingua were conducted in completion mode, whereas chat mode tends to be more sensitive to token-level compression. However, OpenAI has currently disabled GPT-3.5-turbo's completion; you can use GPT-3.5-turbo-instruction or Azure OpenAI service instead. + + +## How to use LLMLingua in LangChain and LlamaIndex? + +### Integration with LangChain + +Thanks to the contributions of Ayo Ayibiowu (@thehapyone), (Long)LLMLingua can be seamlessly integrated into LangChain. Here's an example of how to initialize (Long)LLMLingua within LangChain: + +```python +from langchain.retrievers import ContextualCompressionRetriever +from langchain_community.retrievers.document_compressors import LLMLinguaCompressor +from langchain_openai import ChatOpenAI + +llm = ChatOpenAI(temperature=0) + +compressor = LLMLinguaCompressor(model_name="openai-community/gpt2", device_map="cpu") +compression_retriever = ContextualCompressionRetriever( + base_compressor=compressor, base_retriever=retriever +) + +compressed_docs = compression_retriever.get_relevant_documents( + "What did the president say about Ketanji Jackson Brown" +) +pretty_print_docs(compressed_docs) +``` + +For a more detailed guide, please refer to [Notebook](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/retrievers/llmlingua.ipynb). + +### Integration with LlamaIndex + +Thanks to the contributions of Jerry Liu (@jerryjliu), (Long)LLMLingua can be seamlessly integrated into LlamaIndex. Here's an example of how to initialize (Long)LLMLingua within LlamaIndex: + +```python +from llama_index.query_engine import RetrieverQueryEngine +from llama_index.response_synthesizers import CompactAndRefine +from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor + +node_postprocessor = LongLLMLinguaPostprocessor( + instruction_str="Given the context, please answer the final question", + target_token=300, + rank_method="longllmlingua", + additional_compress_kwargs={ + "condition_compare": True, + "condition_in_question": "after", + "context_budget": "+100", + "reorder_context": "sort", # Enables document reordering + "dynamic_context_compression_ratio": 0.4, # Enables dynamic compression ratio + }, +) +``` + +For a more detailed guide, please refer to [RAGLlamaIndex Example](https://github.com/microsoft/LLMLingua/blob/main/examples/RAGLlamaIndex.ipynb). diff --git a/examples/RAGLlamaIndex.ipynb b/examples/RAGLlamaIndex.ipynb index 56c56d4..f697003 100644 --- a/examples/RAGLlamaIndex.ipynb +++ b/examples/RAGLlamaIndex.ipynb @@ -31,7 +31,7 @@ "id": "a6137de2-0e3f-4962-860c-680da4df2eae", "metadata": {}, "source": [ - "More specifically, [**LongLLMLinguaPostprocessor**](https://github.com/run-llama/llama_index/blob/main/llama_index/postprocessor/longllmlingua.py#L16) can be used as a **Postprocessor** in **LlamaIndex** by invoking it, with arguments consistent with those in the [**PromptCompressor**](https://github.com/microsoft/LLMLingua/blob/main/llmlingua/prompt_compressor.py) of [**LLMLingua**](https://github.com/microsoft/LLMLingua).\n", + "More specifically, [**LongLLMLinguaPostprocessor**](https://github.com/run-llama/llama_index/blob/main/llama-index-legacy/llama_index/legacy/postprocessor/longllmlingua.py#L16) can be used as a **Postprocessor** in **LlamaIndex** by invoking it, with arguments consistent with those in the [**PromptCompressor**](https://github.com/microsoft/LLMLingua/blob/main/llmlingua/prompt_compressor.py) of [**LLMLingua**](https://github.com/microsoft/LLMLingua).\n", "You can call the corresponding compression algorithms in LLMLingua and the question-aware prompt compression method in LongLLMLingua." ] }, diff --git a/tests/test_llmlingua.py b/tests/test_llmlingua.py index 3673525..59302a7 100644 --- a/tests/test_llmlingua.py +++ b/tests/test_llmlingua.py @@ -56,9 +56,10 @@ def __init__(self, *args, **kwargs): super(LLMLinguaTester, self).__init__(*args, **kwargs) try: import nltk - nltk.download('punkt') + + nltk.download("punkt") except: - print('nltk_data exits.') + print("nltk_data exits.") self.llmlingua = PromptCompressor("lgaalves/gpt2-dolly", device_map="cpu") def test_general_compress_prompt(self): diff --git a/tests/test_longllmlingua.py b/tests/test_longllmlingua.py index 27b6005..9b5fc5b 100644 --- a/tests/test_longllmlingua.py +++ b/tests/test_longllmlingua.py @@ -60,9 +60,10 @@ def __init__(self, *args, **kwargs): super(LongLLMLinguaTester, self).__init__(*args, **kwargs) try: import nltk - nltk.download('punkt') + + nltk.download("punkt") except: - print('nltk_data exits.') + print("nltk_data exits.") self.llmlingua = PromptCompressor("lgaalves/gpt2-dolly", device_map="cpu") def test_general_compress_prompt(self):