Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BFCL Evaluation GitHub Action on Pull Requests #817

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions .github/workflows/bfcl_check-illegal-params.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
name: BFCL Illegal Parameter Check

on:
pull_request:
branches: [ main ]

jobs:
check-illegal-params:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
cd berkeley-function-call-leaderboard
pip install -e .

- name: Check for illegal parameter names
id: check_params
working-directory: berkeley-function-call-leaderboard
run: |
# Capture the output of the script
OUTPUT=$(python utils/check_illegal_python_param_name.py)
echo "$OUTPUT"

# If the output contains "Illegal parameter name", fail the check
if echo "$OUTPUT" | grep -q "Illegal parameter name"; then
echo "::error::Found illegal Python parameter names!"
echo "ILLEGAL_PARAMS_FOUND=true" >> $GITHUB_ENV
echo "$OUTPUT" > illegal_params.txt
else
echo "ILLEGAL_PARAMS_FOUND=false" >> $GITHUB_ENV
fi

- name: Comment on PR with results
if: github.event_name == 'pull_request'
uses: peter-evans/create-or-update-comment@v4
with:
issue-number: ${{ github.event.pull_request.number }}
body: |
## BFCL Illegal Parameter Check Results

${{ env.ILLEGAL_PARAMS_FOUND == 'true' && '❌ Failed: Illegal Python parameter names detected!' || '✅ Passed: No illegal parameters found.' }}

${{ env.ILLEGAL_PARAMS_FOUND == 'true' && '### How to fix:
1. Run this script locally to automatically fix the illegal parameters:
```bash
cd berkeley-function-call-leaderboard
python utils/check_illegal_python_param_name.py
```
2. Commit and push the changes
3. Update your pull request' || '' }}

- name: Fail if illegal parameters found
if: env.ILLEGAL_PARAMS_FOUND == 'true'
run: |
cat illegal_params.txt
exit 1
73 changes: 73 additions & 0 deletions .github/workflows/bfcl_evaluation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: BFCL Evaluation Check

on:
pull_request:
branches: [ main ]

jobs:
evaluate:
runs-on: ubuntu-latest

env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
MIN_ACCEPTABLE_SCORE: 0.60

steps:
- uses: actions/checkout@v3

- name: Check for OPENAI_API_KEY
run: |
if [ -z "$OPENAI_API_KEY" ]; then
echo "Error: OPENAI_API_KEY is not set"
exit 1
fi

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
cd berkeley-function-call-leaderboard
pip install -e .

- name: Run BFCL generate
working-directory: berkeley-function-call-leaderboard
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
bfcl generate \
--model gpt-4o-2024-08-06-FC \
--test-category live_parallel

- name: Run BFCL evaluate and extract score
working-directory: berkeley-function-call-leaderboard
run: |
bfcl evaluate \
--model gpt-4o-2024-08-06-FC \
--test-category live_parallel

# Read score from the JSON file - get the first line only and parse accuracy
score=$(head -n 1 score/gpt-4o-2024-08-06-FC/BFCL_v3_live_parallel_score.json | jq -r '.accuracy')
echo "EVALUATION_SCORE=${score}" >> $GITHUB_ENV

if (( $(echo "$score < $MIN_ACCEPTABLE_SCORE" | bc -l) )); then
echo "Score ($score) is below minimum threshold ($MIN_ACCEPTABLE_SCORE)"
exit 1
else
echo "Score ($score) meets or exceeds minimum threshold ($MIN_ACCEPTABLE_SCORE)"
fi

- name: Comment on PR with results
if: github.event_name == 'pull_request'
uses: peter-evans/create-or-update-comment@v4
with:
issue-number: ${{ github.event.pull_request.number }}
body: |
## BFCL Evaluation Results

- Score: ${{ env.EVALUATION_SCORE }}
- Minimum Threshold: ${{ env.MIN_ACCEPTABLE_SCORE }}
- Status: ${{ env.EVALUATION_SCORE >= env.MIN_ACCEPTABLE_SCORE && '✅ Passed' || '❌ Failed' }}
Loading