Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

支持RangeSpecifiedFieldSelector使用指定字段的值域进行数据选择 #432

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

2108038773
Copy link
Contributor

目前的RangeSpecifiedFieldSelector类只支持通过百分位数和rank进行select,这并不符合直觉,最简单且最常用的方法应该是根据某个字段的值域进行选择,比如相似度大于某个阈值,PPL小于某个阈值等,本次PR支持这一功能。
此外,原来的process函数在逻辑判断时存在一定的问题(不支持某些情况下的缺省),lower_percentile和lower_rank不能同时为None,upper_percentile和upper_rank也不能同时为None,否则就不会进行select,这不适用于只有上界或者只有下界的情况,本次PR针对这一逻辑进行了优化。

@HYLcool
Copy link
Collaborator

HYLcool commented Sep 27, 2024

请参考单测日志对照失败样例尝试修复代码或者更新单测样例 link

@drcege drcege marked this pull request as draft November 12, 2024 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants