Installation

Paper | Installation | Eviction | Quantization

We provide three implementations. ThinK_eager contains the code for eager attention, ThinK_flash utilizes FlashAttention and TinK_KIVI which intergrates with KV quantization. Please note that the current implementations may not be fully optimized, and we are actively working on improving their efficiency. We use LongBench to evaluate the performance.

✅ TODO

Support More Models
Support Multi-GPUs
Optimize Efficiency

Installation

Step 1: Clone this repository

Step 2: Setup Environments

conda create -n think python=3.10
conda activate think
pip install -r requirements.txt

Evaluation

Eviction

Evaluate on LongBench: You can first modify the hyperparameters in scripts/scripts_longBench/eval.sh(e.g., pruning_ratio)

cd ThinK_flash
sh ./scripts/scripts_longBench/eval.sh

Results:

sh ./scripts/scripts_longBench/metrics.sh

Quantization

cd ThinK_kivi

Set up the environments as per the instructions from KIVI, adding an additional argument, pruning_ratio. Currently, only LLaMA-2 is supported.

Notes

Users need to make their own assessment regarding any obligations or responsibilities under the corresponding licenses or terms and conditions pertaining to the original datasets and data. This repository is being released for research purposes only.

Citation

@article{xu2024think,
  title={ThinK: Thinner Key Cache by Query-Driven Pruning},
  author={Xu, Yuhui and Jie, Zhanming and Dong, Hanze and Wang, Lei and Lu, Xudong and Zhou, Aojun and Saha, Amrita and Xiong, Caiming and Sahoo, Doyen},
  journal={arXiv preprint arXiv:2407.21018},
  year={2024}
}

Acknowledgement

This repo builds on the SnapKV, PyramidKV, KIVI repos.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ThinK_eager		ThinK_eager
ThinK_flash		ThinK_flash
ThinK_kivi		ThinK_kivi
images		images
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✅ TODO

Installation

Evaluation

Eviction

Quantization

Notes

Citation

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

SalesforceAIResearch/ThinK

Folders and files

Latest commit

History

Repository files navigation

✅ TODO

Installation

Evaluation

Eviction

Quantization

Notes

Citation

Acknowledgement

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages