PyChronoBench is a multiple-choice benchmark designed to evaluate the performance of large language models (LLMs) in using the PyChrono API. The benchmark consists of 280 multiple-choice problems that cover various aspects of PyChrono API usage.
- A collection of real-world API-related questions to assess LLM capabilities in understanding and applying the PyChrono library.
- Questions are structured in a user-friendly format, making it easy to evaluate model performance.
Each question in PyChronoBench follows a simple structure:
- Instruction: The question presented as a prompt, with multiple-choice options.
- Input: No additional input is required.
- Output: The correct answer(s) enclosed in double brackets.
Here are some examples:
{
"instruction": "What method is used to set the friction coefficient for a contact material in PyChrono? 'A. material.SetFriction(value)', 'B. material.SetFrictionCoefficient(value)', 'C. material.SetFrictionValue(value)', 'D. material.SetFrictionFactor(value)'",
"input": "",
"output": "[[A]]"
},
{
"instruction": "How do you add a body to the simulation in PyChrono? 'A. sys.AddBody(body)', 'B. sys.Add(body)', 'C. sys.Insert(body)', 'D. sys.AddObject(body)'",
"input": "",
"output": "[[B]]"
}
PyChronoBench is designed to be easy to integrate into your LLM evaluation pipeline, offering standardized and consistent questions to measure PyChrono API knowledge and decision-making.