Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
staoxiao committed Nov 23, 2023
1 parent 9cce219 commit 8926b54
Show file tree
Hide file tree
Showing 5 changed files with 62 additions and 21 deletions.
9 changes: 5 additions & 4 deletions FlagEmbedding/baai_general_embedding/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,11 @@ Please refer to [C_MTEB](https://github.com/FlagOpen/FlagEmbedding/blob/master/C



## Acknowledgement

Part of the code is developed based on [Dense](https://github.com/luyug/Dense).


## Citation

If you find this repository useful, please consider giving a star :star: and citation
Expand All @@ -229,9 +234,5 @@ If you find this repository useful, please consider giving a star :star: and cit
}
```

## Acknowledgement

Part of the code is developed based on [Dense](https://github.com/luyug/Dense).



10 changes: 6 additions & 4 deletions FlagEmbedding/reranker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@ Currently, this model mainly supports Chinese and English, and may see performan
\* : T2RerankingZh2En and T2RerankingEn2Zh are cross-language retrieval task



## Acknowledgement

Part of the code is developed based on [Reranker](https://github.com/luyug/Reranker).


## Citation

If you find this repository useful, please consider giving a star :star: and citation
Expand All @@ -90,7 +96,3 @@ If you find this repository useful, please consider giving a star :star: and cit
primaryClass={cs.CL}
}
```

## Acknowledgement

Part of the code is developed based on [Reranker](https://github.com/luyug/Reranker).
20 changes: 20 additions & 0 deletions LM_Cocktail/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,3 +193,23 @@ Use [MTEB script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB)
python eval_MTEB.py --model_name_or_path mixed_model --task_type Retrieval
```

## Acknowledgement

The Llama is fine-tuned using the [FastChat](https://github.com/lm-sys/FastChat) scripts.
Fine-tuning datasets are from [sentence-transformers/embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data) and [intfloat/llm-retriever-tasks](https://huggingface.co/datasets/intfloat/llm-retriever-tasks).


## Citation

If you find this repository useful, please consider giving a star :star: and citation

```
@misc{cocktail,
title={LM-Cocktail: Resilient Tuning of Language Models via Model Merging},
author={Shitao Xiao and Zheng Liu and Peitian Zhang and Xingrun Xing},
year={2023},
eprint={2311.13534},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
19 changes: 14 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@

FlagEmbedding focus on retrieval-augmented LLMs, consisting of following projects currently:

- **Fine-tuning of LM** : [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/LM_Cocktail)
- **Fine-tuning of LM** : [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail)
- **Dense Retrieval**: [LLM Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), [BGE Embedding](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding), [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB)
- **Reranker Model**: [BGE Reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/reranker)

Expand Down Expand Up @@ -147,11 +147,11 @@ For more training details for bge see [baai_general_embedding](https://github.co
If you find this repository useful, please consider giving a star :star: and citation

```
@misc{bge_embedding,
title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
@misc{cocktail,
title={LM-Cocktail: Resilient Tuning of Language Models via Model Merging},
author={Shitao Xiao and Zheng Liu and Peitian Zhang and Xingrun Xing},
year={2023},
eprint={2309.07597},
eprint={2311.13534},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Expand All @@ -164,6 +164,15 @@ If you find this repository useful, please consider giving a star :star: and cit
archivePrefix={arXiv},
primaryClass={cs.IR}
}
@misc{bge_embedding,
title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
year={2023},
eprint={2309.07597},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

## License
Expand Down
25 changes: 17 additions & 8 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ FlagEmbedding专注于检索增强llm领域,目前包括以下项目:

## 更新

- 11/20/2023: Release [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/LM_Cocktail), 一种在微调时保持原有模型通用能力的方法. [论文链接](https://arxiv.org/abs/2311.13534) :fire:
- 11/23/2023: Release [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/LM_Cocktail), 一种在微调时保持原有模型通用能力的方法. [论文链接](https://arxiv.org/abs/2311.13534) :fire:
- 10/12/2023: 发布 [LLM-Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder), 专为大语言模型**各种检索增强任务设计**的英文向量模型。[论文链接](https://arxiv.org/pdf/2310.07554.pdf)
- 09/15/2023: 发布 [论文](https://arxiv.org/pdf/2309.07597.pdf)[数据集](https://data.baai.ac.cn/details/BAAI-MTP).
- 09/12/2023: 更新:
Expand All @@ -58,10 +58,10 @@ FlagEmbedding专注于检索增强llm领域,目前包括以下项目:

### [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail)

微调预训练语言模型会不断微调可以更好地支持下游任务。但是,该操作可能会导致目标领域之外的一般性任务上性能下降。
为了克服这个问题,我们提出了:LM-Cocktail。 LM-Cocktail在提高下游目标任务的准确度的同时,保持在其他任务上的性能。
微调预训练语言模型可以更好地支持下游任务。但是,该操作可能会导致目标领域之外的一般性任务上性能下降。
为了克服这个问题,我们提出了LM-Cocktail。 LM-Cocktail在提高下游目标任务的准确度的同时,保持在其他任务上的性能。
它还可以用于为新任务生成模型,避免微调对资源和数据的要求。
更多细节请参考我们的[论文](https://arxiv.org/abs/2311.13534)[代码](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail)
更多细节请参考[论文](https://arxiv.org/abs/2311.13534)[代码](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail)


### [LLM Embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_embedder)
Expand Down Expand Up @@ -129,11 +129,11 @@ BGE Embedding是一个通用向量模型。 我们使用[retromae](https://githu

如果您觉得我们的工作有所帮助,请考虑点个星 :star: 和引用以下论文:
```
@misc{bge_embedding,
title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
@misc{cocktail,
title={LM-Cocktail: Resilient Tuning of Language Models via Model Merging},
author={Shitao Xiao and Zheng Liu and Peitian Zhang and Xingrun Xing},
year={2023},
eprint={2309.07597},
eprint={2311.13534},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Expand All @@ -146,6 +146,15 @@ BGE Embedding是一个通用向量模型。 我们使用[retromae](https://githu
archivePrefix={arXiv},
primaryClass={cs.IR}
}
@misc{bge_embedding,
title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
year={2023},
eprint={2309.07597},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

## License
Expand Down

0 comments on commit 8926b54

Please sign in to comment.