-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate confidence interval with bootstrap #20
Comments
I'm happy to welcome more quality-of-life improvements to Evalica. At the same time, I'm focused on achieving a clean design and maintaining 100% test coverage, as reproducibility is one of the core goals. Originally, Evalica was built to accelerate the computation of confidence intervals. However, its API currently lacks specialized utilities for this purpose. You can see an example of achieving this at https://github.com/VikhrModels/ru_llm_arena/blob/56d1edabb069945c81254969cdc9dd1df62c0d89/show_result.py. It works roughly like this: *_, index = evalica.indexing(
xs=df["model_a"], # series with model A identifiers
ys=df["model_b"], # series with model B identifiers
)
bootstrap: list["pd.Series[float]"] = []
for r in range(BOOTSTRAP_ROUNDS):
df_sample = df.sample(frac=1.0, replace=True, random_state=r)
result_sample = evalica.bradley_terry(
xs=df_sample["model_a"],
ys=df_sample["model_b"],
winners=df_sample["winner"],
index=index # to save time by not re-indexing the elements
)
bootstrap.append(result_sample.scores)
df_bootstrap = pd.DataFrame(bootstrap) |
Can you assign this task to me? I'd like to implement it. |
Sure, why not. Please go ahead but please outline the API usage examples first so we’ll be on the same page. |
I agree. Let's explore some examples to make it clear. For simplicity, Let's call the core function as Firstly, the output of The simplest way to call it: import evalica
df = pd.read_csv(...)
df_bootstrap = evalica.bootstrap_ci(df, score_method='bradley-terry') If df_bootstrap = evalica.bootstrap_ci(df, score_method='bradley-terry', left_column='left_model', right_column='right_model', winner_column='winner_column', weight_column='weight_column') We can also setting the df_bootstrap = evalica.bootstrap_ci(df, score_method='bradley-terry', win_weight=1.0, tie_weight=0.3) If we are using df_bootstrap = evalica.bootstrap_ci(df, score_method='elo', initial=1200, base=10, ...) And we can also control the bootstrap process settings: df_bootstrap = evalica.bootstrap_ci(df, score_method='bradley-terry', num_rounds=1000, sample_rate=0.99, with_replace=True) According to the usage example, here are some reference design about the api input: The input of
|
In this setting, we replicate the inconvenient aspect of Crowd-Kit's design, which required developers to manually construct a data frame in the proper format. In practice, we found this process to be highly inconvenient. By contrast, Evalica's columnar approach is significantly more user-friendly. The currently proposed approach introduces a requirement for a proxy to resolve the evalica.bootstrap(
method=evalica.bradley_terry,
xs=df['left'],
ys=df['right'],
winners=df['winner'],
weights=df['weights'], # this one is optional like in the rest of the library
n_resamples=10000,
confidence_level=0.95,
**kwags, # for simplicity; these arguments are passed to the specified method
) Also, for reference, consider scipy.stats.bootstrap. Do you think we could use it directly to avoid writing custom bootstrapping code? |
Sorry for the late reply. Thanks for your suggestion and I agree this lightweight wrapper is more flexible and I'll try to implement it in this way(Maybe a few days later, I'm afraid that I don't have too much time in recently). As for the |
As shown in: https://lmsys.org/blog/2023-12-07-leaderboard/.
I think this function will be useful and convenient in practice. But I'm not sure if it's appropriate to add it to evalica. I think evalica is very consice and focus on core computation of the algorithms, which is good enough currently.
If it's not appropriate, I'll try to build a simple python package based on evalica to add this functions, maybe will more visualization functions(as shown in the jupyter notebooks) too.
The text was updated successfully, but these errors were encountered: