PyChronoBench

PyChronoBench is a multiple-choice benchmark designed to evaluate the performance of large language models (LLMs) in using the PyChrono API. The benchmark consists of 280 multiple-choice problems that cover various aspects of PyChrono API usage.

Features

A collection of real-world API-related questions to assess LLM capabilities in understanding and applying the PyChrono library.
Questions are structured in a user-friendly format, making it easy to evaluate model performance.

Format

Each question in PyChronoBench follows a simple structure:

Instruction: The question presented as a prompt, with multiple-choice options.
Input: No additional input is required.
Output: The correct answer(s) enclosed in double brackets.

Here are some examples:

{
    "instruction": "What method is used to set the friction coefficient for a contact material in PyChrono? 'A. material.SetFriction(value)', 'B. material.SetFrictionCoefficient(value)', 'C. material.SetFrictionValue(value)', 'D. material.SetFrictionFactor(value)'",
    "input": "",
    "output": "[[A]]"
},
{
    "instruction": "How do you add a body to the simulation in PyChrono? 'A. sys.AddBody(body)', 'B. sys.Add(body)', 'C. sys.Insert(body)', 'D. sys.AddObject(body)'",
    "input": "",
    "output": "[[B]]"
}

Usage

PyChronoBench is designed to be easy to integrate into your LLM evaluation pipeline, offering standardized and consistent questions to measure PyChrono API knowledge and decision-making.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
llm_csv		llm_csv
llm_outputs		llm_outputs
LICENSE		LICENSE
README.md		README.md
calculate_from_csv.py		calculate_from_csv.py
calculate_rate.py		calculate_rate.py
claude_inference.py		claude_inference.py
extract_mistake.py		extract_mistake.py
extracted_answers.csv		extracted_answers.csv
google_api_inference.py		google_api_inference.py
llm_extraction.py		llm_extraction.py
llm_mistakes.csv		llm_mistakes.csv
llm_results.csv		llm_results.csv
openai_api_inference.py		openai_api_inference.py
openai_style_api_inference.py		openai_style_api_inference.py
pychrono_test.json		pychrono_test.json
reproduce_plot.py		reproduce_plot.py
success_rates.csv		success_rates.csv
visualization.png		visualization.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyChronoBench

Features

Format

Usage

About

Releases

Packages

Languages

License

uwsbel/PyChronoBench

Folders and files

Latest commit

History

Repository files navigation

PyChronoBench

Features

Format

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages