Skip to content

Commit

Permalink
Feature(LLMLingua-2): add LLMLingua-2 (microsoft#111)
Browse files Browse the repository at this point in the history
* Feature(LLMLingua-2): add LLMLingua-2

Co-authored-by: Qianhui Wu <[email protected]>
Co-authored-by: panzs <[email protected]>
Co-authored-by: Xufang Luo <[email protected]>
Co-authored-by: Yuqing Yang <[email protected]>
  • Loading branch information
5 people authored Mar 19, 2024
1 parent 00d5271 commit c2164c0
Show file tree
Hide file tree
Showing 40 changed files with 5,251 additions and 179 deletions.
23 changes: 4 additions & 19 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,30 +22,15 @@ repos:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: no-commit-to-branch
- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
# - repo: https://github.com/charliermarsh/ruff-pre-commit
# rev: v0.0.261
# hooks:
# - id: ruff
# args: ["--fix"]
# - repo: https://github.com/codespell-project/codespell
# rev: v2.2.6
# hooks:
# - id: codespell
# args: ["-L", "ans,linar,nam,"]
# exclude: |
# (?x)^(
# pyproject.toml |
# website/static/img/ag.svg |
# website/yarn.lock |
# notebook/.*
# )$
- repo: https://github.com/nbQA-dev/nbQA
rev: 1.7.1
hooks:
# - id: nbqa-ruff
# args: ["--fix"]
- id: nbqa-black
376 changes: 266 additions & 110 deletions DOCUMENT.md

Large diffs are not rendered by default.

92 changes: 77 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,24 @@
<img src="images/LLMLingua_logo.png" alt="LLMLingua" width="100" align="left">
</div>
<div style="flex-grow: 1;" align="center">
<h2 align="center">(Long)LLMLingua: Enhancing Large Language Model Inference via Prompt Compression</h2>
<h2 align="center">LLMLingua Series | Effectively Deliver Information to LLMs via Prompt Compression</h2>
</div>
</div>

<p align="center">
| <a href="https://llmlingua.com/"><b>Project Page</b></a> |
<a href="https://arxiv.org/abs/2310.05736"><b>LLMLingua Paper</b></a> |
<a href="https://arxiv.org/abs/2310.06839"><b>LongLLMLingua Paper</b></a> |
<a href="https://huggingface.co/spaces/microsoft/LLMLingua"><b>HF Space Demo</b></a> |
<a href="https://aclanthology.org/2023.emnlp-main.825/"><b>LLMLingua</b></a> |
<a href="https://arxiv.org/abs/2310.06839"><b>LongLLMLingua</b></a> |
<a href="https://arxiv.org/abs/2403."><b>LLMLingua-2</b></a> |
<a href="https://huggingface.co/spaces/microsoft/LLMLingua"><b>LLMLingua Demo</b></a> |
<a href="https://huggingface.co/spaces/microsoft/LLMLingua-2"><b>LLMLingua-2 Demo</b></a> |
</p>

https://github.com/microsoft/LLMLingua/assets/30883354/eb0ea70d-6d4c-4aa7-8977-61f94bb87438

## News

- 🦚 We're excited to announce the release of **LLMLingua-2**, boasting a 3x-6x speed improvement over LLMLingua! For more information, check out our [paper](https://arxiv.org/abs/2403.), visit the [project page](https://llmlingua.com/llmlingua-2.html), and explore our [demo](https://huggingface.co/spaces/microsoft/LLMLingua-2).
- 👾 LLMLingua has been integrated into [LangChain](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/retrievers/llmlingua.ipynb) and [LlamaIndex](https://github.com/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/LongLLMLingua.ipynb), two widely-used RAG frameworks.
- 🤳 Talk slides are available in [AI Time Jan, 24](https://drive.google.com/file/d/1fzK3wOvy2boF7XzaYuq2bQ3jFeP1WMk3/view?usp=sharing).
- 🖥 EMNLP'23 slides are available in [Session 5](https://drive.google.com/file/d/1GxQLAEN8bBB2yiEdQdW4UKoJzZc0es9t/view) and [BoF-6](https://drive.google.com/file/d/1LJBUfJrKxbpdkwo13SgPOqugk-UjLVIF/view).
Expand All @@ -28,13 +31,19 @@ https://github.com/microsoft/LLMLingua/assets/30883354/eb0ea70d-6d4c-4aa7-8977-6
## TL;DR

LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.

- [LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models](https://aclanthology.org/2023.emnlp-main.825/) (EMNLP 2023)<br>
_Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
_Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu_

LongLLMLingua mitigates the 'lost in the middle' issue in LLMs, enhancing long-context information processing. It reduces costs and boosts efficiency with prompt compression, improving RAG performance by up to 21.4% using only 1/4 of the tokens.

- [LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://arxiv.org/abs/2310.06839) (ICLR ME-FoMo 2024)<br>
_Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
_Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_

LLMLingua-2, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.

- [LLMLingua-2: Context-Aware Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://arxiv.org/abs/2403.) (Under Review)<br>
_Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang_

## 🎥 Overview

Expand All @@ -48,11 +57,11 @@ While Large Language Models like ChatGPT and GPT-4 excel in generalization and r

![Motivation for LLMLingua](./images/motivation.png)

Now you can use **LLMLingua** & **LongLLMLingua**!
Now you can use **LLMLingua**, **LongLLMLingua**, and **LLMLingua-2**!

These tools offer an efficient solution to compress prompts by up to **20x**, enhancing the utility of LLMs.

- 💰 **Cost Savings**: Reduces both prompt and generation lengths.
- 💰 **Cost Savings**: Reduces both prompt and generation lengths with minimal overhead.
- 📝 **Extended Context Support**: Enhances support for longer contexts, mitigates the "lost in the middle" issue, and boosts overall performance.
- ⚖️ **Robustness**: No additional training needed for LLMs.
- 🕵️ **Knowledge Retention**: Maintains original prompt information like ICL and reasoning.
Expand All @@ -63,7 +72,7 @@ These tools offer an efficient solution to compress prompts by up to **20x**, en

![Framework of LongLLMLingua](./images/LongLLMLingua.png)

![Demo of LLMLingua](./images/LLMLingua_demo.png)
![Framework of LLMLingua-2](./images/LLMLingua-2.png)

PS: This demo is based on the [alt-gpt](https://github.com/feedox/alt-gpt) project. Special thanks to @Livshitz for their valuable contribution.

Expand All @@ -82,6 +91,7 @@ If you find this repo helpful, please cite the following papers:
pages = "13358--13376",
}
```

```bibtex
@article{jiang-etal-2023-longllmlingua,
title = "{L}ong{LLML}ingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression",
Expand All @@ -93,19 +103,30 @@ If you find this repo helpful, please cite the following papers:
}
```

```bibtex
@article{wu2024llmlingua2,
title = "{LLML}ingua-2: Context-Aware Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression",
author = "Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang",
url = "https://arxiv.org/abs/2403.",
journal = "ArXiv preprint",
volume = "abs/2403.",
year = "2024",
}
```

## 🎯 Quick Start

#### 1. **Installing (Long)LLMLingua:**
#### 1. **Installing LLMLingua:**

To get started with (Long)LLMLingua, simply install it using pip:
To get started with LLMLingua, simply install it using pip:

```bash
pip install llmlingua
```

#### 2. **Using (Long)LLMLingua for Prompt Compression:**
#### 2. **Using LLMLingua Series Methods for Prompt Compression:**

With (Long)LLMLingua, you can easily compress your prompts. Here’s how you can do it:
With **LLMLingua**, you can easily compress your prompts. Here’s how you can do it:

```python
from llmlingua import PromptCompressor
Expand All @@ -120,14 +141,51 @@ compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question=
# 'saving': ', Saving $0.1 in GPT-4.'}

## Or use the phi-2 model,
## Before that, you need to update the transformers to the github version, like pip install -U git+https://github.com/huggingface/transformers.git
llm_lingua = PromptCompressor("microsoft/phi-2")

## Or use the quantation model, like TheBloke/Llama-2-7b-Chat-GPTQ, only need <8GB GPU memory.
## Before that, you need to pip install optimum auto-gptq
llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})
```

To try **LongLLMLingua** in your scenorias, you can use

```python
from llmlingua import PromptCompressor

llm_lingua = PromptCompressor()
compressed_prompt = llm_lingua.compress_prompt(
prompt_list,
question=question,
ratio=0.55,
# Set the special parameter for LongLLMLingua
condition_in_question="after_condition",
reorder_context="sort",
dynamic_context_compression_ratio=0.3, # or 0.4
condition_compare=True,
context_budget="+100",
rank_method="longllmlingua",
)
```

To try **LLMLingua-2** in your scenorias, you can use

```python
from llmlingua import PromptCompressor

llm_lingua = PromptCompressor(
model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
use_llmlingua2=True, # Whether to use llmlingua-2
)
compressed_prompt = llm_lingua.compress_prompt(prompt, rate=0.33, force_tokens = ['\n', '?'])

## Or use LLMLingua-2-small model
llm_lingua = PromptCompressor(
model_name="microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank",
use_llmlingua2=True, # Whether to use llmlingua-2
)
```

#### 3. **Advanced usage - Structured Prompt Compression:**

Split text into sections, decide on whether to compress and its rate. Use `<llmlingua></llmlingua>` tags for context segmentation, with optional rate and compress parameters.
Expand All @@ -148,13 +206,17 @@ print(compressed_prompt['compressed_prompt'])

To understand how to apply LLMLingua and LongLLMLingua in real-world scenarios like RAG, Online Meetings, CoT, and Code, please refer to our [**examples**](./examples). For detailed guidance, the [**documentation**](./DOCUMENT.md) provides extensive recommendations on effectively utilizing LLMLingua.

#### 5. **Data collection and model training of LLMLingua-2:**

To train the compressor on your custom data, please refer to our [**data_collection**](./experiments/llmlingua2/data_collection) and [**model_training**](./experiments/llmlingua2/model_training).

## Frequently Asked Questions

For more insights and answers, visit our [FAQ section](./Transparency_FAQ.md).

## Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

Expand Down
21 changes: 20 additions & 1 deletion Transparency_FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ Out[3]:
}
```

## How to reproduce the result in LLMLingua & LongLLMLingua?
## How to reproduce the result in LLMLingua Series work?

We release the parameter in the [issue1](https://github.com/microsoft/LLMLingua/issues/76), [issue2](https://github.com/microsoft/LLMLingua/issues/86).

Expand Down Expand Up @@ -157,6 +157,25 @@ compressed_prompt = llm_lingua.compress_prompt(

Experiments in LLMLingua and most experiments in LongLLMLingua were conducted in completion mode, whereas chat mode tends to be more sensitive to token-level compression. However, OpenAI has currently disabled GPT-3.5-turbo's completion; you can use GPT-3.5-turbo-instruction or Azure OpenAI service instead.

**LLMLingua-2**:

```python
from llmlingua import PromptCompressor

llm_lingua = PromptCompressor(
model_name="microsoft/llmlingua-2-xlm-roberta-large-meetingbank",
use_llmlingua2=True, # Whether to use llmlingua-2
)
compressed_prompt = llm_lingua.compress_prompt(prompt, rate=0.33, force_tokens = ['\n', '?'])

## Or use LLMLingua-2-small model
llm_lingua = PromptCompressor(
model_name="microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank",
use_llmlingua2=True, # Whether to use llmlingua-2
)
```

And you can find the details of the LLMLingua-2 experiments at [experiments/llmlingua2](./examples/llmlingua2).

## How to use LLMLingua in LangChain and LlamaIndex?

Expand Down
Loading

0 comments on commit c2164c0

Please sign in to comment.