Skip to content

Commit

Permalink
Feature(LLMLingua-2): fix the title
Browse files Browse the repository at this point in the history
  • Loading branch information
iofu728 committed Mar 24, 2024
1 parent 5828ac3 commit 4794c6c
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 6 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ LongLLMLingua mitigates the 'lost in the middle' issue in LLMs, enhancing long-c

LLMLingua-2, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.

- [LLMLingua-2: Context-Aware Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://arxiv.org/abs/2403.12968) (Under Review)<br>
- [LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://arxiv.org/abs/2403.12968) (Under Review)<br>
_Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang_

## 🎥 Overview
Expand Down
8 changes: 3 additions & 5 deletions llmlingua/prompt_compressor.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,19 +38,17 @@ class PromptCompressor:
The PromptCompressor class is versatile and can be adapted for various models and specific requirements in prompt processing.
Users can specify different model names and configurations as needed for their particular use case.The architecture is
based on the paper "LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models". Jiang, Huiqiang, Qianhui Wu,
Chin-Yew Lin, Yuqing Yang, and Lili Qiu. "Llmlingua: Compressing prompts for accelerated inference of large language models."
arXiv preprint arXiv:2310.05736 (2023).
Chin-Yew Lin, Yuqing Yang, and Lili Qiu. arXiv preprint arXiv:2310.05736 (2023).
Args:
model_name (str, optional): The name of the language model to be loaded. Default is "NousResearch/Llama-2-7b-hf".
device_map (str, optional): The device to load the model onto, e.g., "cuda" for GPU. Default is "cuda".
model_config (dict, optional): A dictionary containing the configuration parameters for the model. Default is an empty dictionary.
open_api_config (dict, optional): A dictionary containing configuration for openai APIs that may be used in conjunction with the model. Default is an empty dictionary.
use_llmlingua2 (bool, optional): Whether to use llmlingua-2 compressor based on the paper
"LLMLingua-2: Context-Aware Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression".
"LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression".
Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang.
"LLMLingua-2: Context-Aware Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression". arXiv preprint arXiv:,
Default is True.
arXiv preprint arXiv:2403.2403.12968 (2024), Default is True.
llmlingua2_config (dict, optional): A dictionary containing the configuration parameters for llmlingua-2. Default is
{
"max_batch_size": 50,
Expand Down

0 comments on commit 4794c6c

Please sign in to comment.