-
Notifications
You must be signed in to change notification settings - Fork 264
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feature (LLMLingua): support customized compression spec (#96)
* update the definition of compression ratio, structured prompt compress * merge part of structured_compress_prompt into compress_prompt * add no-compression retention to sentence and context level filter * update global ratio and global target token * fix segments_info error * change input paramete from ratio to rate * change parameter name * Feature(LLMLingua): add template & test scripts * Feature(LLMLingua): add pre commit check & unittest * Feature(LLMLingua): update the release script * Feature(LLMLingua): add HF_TOKEN * Added unittest for structured_compress_prompt and fixed bugs (#95) - fix bugs and add unittests - make style - add nltk init - fix nltk file exist error - add unittest for different models --------- Co-authored-by: siyunzhao <[email protected]> Co-authored-by: Qianhui Wu <[email protected]> Co-authored-by: Xufang Luo <[email protected]> Co-authored-by: Yuqing Yang <[email protected]>
- Loading branch information
1 parent
4afe5a4
commit 042bd0e
Showing
27 changed files
with
1,326 additions
and
180 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
name: "\U0001F41B Bug Report" | ||
description: Submit a bug report to help us improve LLMLingua | ||
title: "[Bug]: " | ||
labels: ["bug"] | ||
|
||
body: | ||
- type: textarea | ||
id: description | ||
attributes: | ||
label: Describe the bug | ||
description: A clear and concise description of what the bug is. | ||
placeholder: What went wrong? | ||
- type: textarea | ||
id: reproduce | ||
attributes: | ||
label: Steps to reproduce | ||
description: | | ||
Steps to reproduce the behavior: | ||
1. Step 1 | ||
2. Step 2 | ||
3. ... | ||
4. See error | ||
placeholder: How can we replicate the issue? | ||
- type: textarea | ||
id: expected_behavior | ||
attributes: | ||
label: Expected Behavior | ||
description: A clear and concise description of what you expected to happen. | ||
placeholder: What should have happened? | ||
- type: textarea | ||
id: logs | ||
attributes: | ||
label: Logs | ||
description: If applicable, add logs or screenshots to help explain your problem. | ||
placeholder: Add logs here | ||
- type: textarea | ||
id: additional_information | ||
attributes: | ||
label: Additional Information | ||
description: | | ||
- LLMLingua Version: <!-- Specify the LLMLingua version (e.g., v0.1.0) --> | ||
- Operating System: <!-- Specify the OS (e.g., Windows 10, Ubuntu 20.04) --> | ||
- Python Version: <!-- Specify the Python version (e.g., 3.8) --> | ||
- Related Issues: <!-- Link to any related issues here (e.g., #1) --> | ||
- Any other relevant information. | ||
placeholder: Any additional details |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
blank_issues_enabled: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
name: "\U0001F680 Feature request" | ||
description: Submit a proposal/request for a new LLMLingua feature | ||
labels: ["feature request"] | ||
title: "[Feature Request]: " | ||
|
||
body: | ||
- type: textarea | ||
id: problem_description | ||
attributes: | ||
label: Is your feature request related to a problem? Please describe. | ||
description: A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] | ||
placeholder: What problem are you trying to solve? | ||
|
||
- type: textarea | ||
id: solution_description | ||
attributes: | ||
label: Describe the solution you'd like | ||
description: A clear and concise description of what you want to happen. | ||
placeholder: How do you envision the solution? | ||
|
||
- type: textarea | ||
id: additional_context | ||
attributes: | ||
label: Additional context | ||
description: Add any other context or screenshots about the feature request here. | ||
placeholder: Any additional information |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
name: "\U0001F31F General Question" | ||
description: File a general question | ||
title: "[Question]: " | ||
labels: ["question"] | ||
|
||
body: | ||
- type: textarea | ||
id: description | ||
attributes: | ||
label: Describe the issue | ||
description: A clear and concise description of what the question is. | ||
placeholder: The detail of question. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# What does this PR do? | ||
|
||
<!-- | ||
Congratulations! You've made it this far! You're not quite done yet though. | ||
Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. | ||
Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. | ||
Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. | ||
--> | ||
|
||
<!-- Remove if not applicable --> | ||
|
||
Fixes # (issue) | ||
|
||
|
||
## Before submitting | ||
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). | ||
- [ ] Was this discussed/approved via a Github issue? Please add a link | ||
to it if that's the case. | ||
- [ ] Did you make sure to update the documentation with your changes? | ||
- [ ] Did you write any new necessary tests? | ||
|
||
|
||
## Who can review? | ||
|
||
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag | ||
members/contributors who may be interested in your PR. | ||
|
||
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ | ||
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**. | ||
Please tag fewer than 3 people. | ||
LLMLingua/LongLLMLingua: | ||
- general: @iofu728, @QianhuiWu, @XufangLuo, and @mydmdm | ||
- new feature: @SiyunZhao | ||
Documentation: @SiyunZhao | ||
--> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# This workflows will build and upload a Python Package using Twine when a release is published | ||
# Conda-forge bot will pick up new PyPI version and automatically create new version | ||
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries | ||
|
||
name: release | ||
run-name: Release LLMLingua by @${{ github.actor }} | ||
|
||
on: | ||
release: | ||
types: [published] | ||
permissions: {} | ||
|
||
jobs: | ||
deploy: | ||
strategy: | ||
matrix: | ||
os: ['ubuntu-latest'] | ||
python-version: [3.10] | ||
runs-on: ${{ matrix.os }} | ||
environment: | ||
name: pypi | ||
url: https://pypi.org/project/llmlingua/ | ||
permissions: | ||
id-token: write | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v3 | ||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Install from source | ||
# This is required for the pre-commit tests | ||
shell: pwsh | ||
run: pip install . | ||
|
||
- name: Build | ||
shell: pwsh | ||
run: | | ||
pip install twine | ||
python setup.py sdist bdist_wheel | ||
- name: Publish package distributions to PyPI | ||
uses: pypa/gh-action-pypi-publish@release/v1 | ||
with: | ||
print-hash: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
name: Unit Test | ||
|
||
# see: https://help.github.com/en/actions/reference/events-that-trigger-workflows | ||
on: # Trigger the workflow on pull request or merge | ||
pull_request: | ||
merge_group: | ||
types: [checks_requested] | ||
|
||
defaults: | ||
run: | ||
shell: bash | ||
permissions: {} | ||
|
||
jobs: | ||
UnitTest: | ||
runs-on: ${{ matrix.os }} | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
os: [ubuntu-latest, macos-latest, windows-2019] | ||
python-version: ["3.9", "3.10", "3.11"] | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v5 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Install packages and dependencies for all tests | ||
run: | | ||
python -m pip install --upgrade pip wheel | ||
pip install pytest pytest-xdist | ||
- name: Install packages | ||
run: | | ||
pip install -e . | ||
- name: Run core tests | ||
shell: bash | ||
env: | ||
HF_TOKEN: ${{ secrets.HF_TOKEN}} | ||
run: | | ||
make test |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -400,4 +400,4 @@ FodyWeavers.xsd | |
|
||
# build | ||
build/* | ||
dist/* | ||
dist/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
default_language_version: | ||
python: python3 | ||
exclude: 'dotnet' | ||
ci: | ||
autofix_prs: true | ||
autoupdate_commit_msg: '[pre-commit.ci] pre-commit suggestions' | ||
autoupdate_schedule: 'quarterly' | ||
|
||
repos: | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v4.4.0 | ||
hooks: | ||
- id: check-added-large-files | ||
- id: check-ast | ||
- id: check-yaml | ||
- id: check-toml | ||
- id: check-json | ||
- id: check-byte-order-marker | ||
exclude: .gitignore | ||
- id: check-merge-conflict | ||
- id: detect-private-key | ||
- id: trailing-whitespace | ||
- id: end-of-file-fixer | ||
- id: no-commit-to-branch | ||
- repo: https://github.com/psf/black | ||
rev: 23.3.0 | ||
hooks: | ||
- id: black | ||
# - repo: https://github.com/charliermarsh/ruff-pre-commit | ||
# rev: v0.0.261 | ||
# hooks: | ||
# - id: ruff | ||
# args: ["--fix"] | ||
# - repo: https://github.com/codespell-project/codespell | ||
# rev: v2.2.6 | ||
# hooks: | ||
# - id: codespell | ||
# args: ["-L", "ans,linar,nam,"] | ||
# exclude: | | ||
# (?x)^( | ||
# pyproject.toml | | ||
# website/static/img/ag.svg | | ||
# website/yarn.lock | | ||
# notebook/.* | ||
# )$ | ||
- repo: https://github.com/nbQA-dev/nbQA | ||
rev: 1.7.1 | ||
hooks: | ||
# - id: nbqa-ruff | ||
# args: ["--fix"] | ||
- id: nbqa-black |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,16 @@ | ||
<div style="display: flex; align-items: center;"> | ||
<div style="width: 100px; margin-right: 10px; height:auto;" align="left"> | ||
<img src="images/LLMLingua_logo.png" alt="LLMLingua" width="100" align="left"> | ||
</div> | ||
<div style="flex-grow: 1;" align="center"> | ||
<h2 align="center">(Long)LLMLingua: Enhancing Large Language Model Inference via Prompt Compression</h2> | ||
</div> | ||
<div style="display: flex; align-items: center;"> | ||
<div style="width: 100px; margin-right: 10px; height:auto;" align="left"> | ||
<img src="images/LLMLingua_logo.png" alt="LLMLingua" width="100" align="left"> | ||
</div> | ||
<div style="flex-grow: 1;" align="center"> | ||
<h2 align="center">(Long)LLMLingua: Enhancing Large Language Model Inference via Prompt Compression</h2> | ||
</div> | ||
</div> | ||
|
||
<p align="center"> | ||
| <a href="https://llmlingua.com/"><b>Project Page</b></a> | | ||
<a href="https://arxiv.org/abs/2310.05736"><b>LLMLingua Paper</b></a> | | ||
<a href="https://arxiv.org/abs/2310.06839"><b>LongLLMLingua Paper</b></a> | | ||
| <a href="https://llmlingua.com/"><b>Project Page</b></a> | | ||
<a href="https://arxiv.org/abs/2310.05736"><b>LLMLingua Paper</b></a> | | ||
<a href="https://arxiv.org/abs/2310.06839"><b>LongLLMLingua Paper</b></a> | | ||
<a href="https://huggingface.co/spaces/microsoft/LLMLingua"><b>HF Space Demo</b></a> | | ||
</p> | ||
|
||
|
@@ -102,7 +102,7 @@ To get started with (Long)LLMLingua, simply install it using pip: | |
```bash | ||
pip install llmlingua | ||
``` | ||
|
||
#### 2. **Using (Long)LLMLingua for Prompt Compression:** | ||
|
||
With (Long)LLMLingua, you can easily compress your prompts. Here’s how you can do it: | ||
|
@@ -152,8 +152,8 @@ contact [[email protected]](mailto:[email protected]) with any additio | |
|
||
## Trademarks | ||
|
||
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft | ||
trademarks or logos is subject to and must follow | ||
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft | ||
trademarks or logos is subject to and must follow | ||
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). | ||
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. | ||
Any use of third-party trademarks or logos are subject to those third-party's policies. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,7 @@ Instead, please report them to the Microsoft Security Response Center (MSRC) at | |
|
||
If you prefer to submit without logging in, send email to [[email protected]](mailto:[email protected]). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey). | ||
|
||
You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). | ||
You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). | ||
|
||
Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,13 @@ | ||
# Support | ||
|
||
## How to file issues and get help | ||
## How to file issues and get help | ||
|
||
This project uses GitHub Issues to track bugs and feature requests. Please search the existing | ||
issues before filing new issues to avoid duplicates. For new issues, file your bug or | ||
This project uses GitHub Issues to track bugs and feature requests. Please search the existing | ||
issues before filing new issues to avoid duplicates. For new issues, file your bug or | ||
feature request as a new Issue. | ||
|
||
For help and questions about using this project, please refer the [document](./DOCUMENT.md). | ||
|
||
## Microsoft Support Policy | ||
## Microsoft Support Policy | ||
|
||
Support for this **PROJECT or PRODUCT** is limited to the resources listed above. |
Oops, something went wrong.