Salesforce SWE Committee

Prepare Relevant Code Spans

We use the code spans identified as relevant to an issue to rerank the candidates.

Our experiments use the code spans identified by Moatless Tools. resource/20240623_moatless_claude-3.5-sonnet contains the trajectories, from which we extract the code spans identified. Most of the trajectories are from Moatless Tools + Claude 3.5 Sonnet, but for those that are missing in that setting, we use the trajectories from Moatless Tools + GPT-4o.

Run prepare_spans.py to extract the code spans from the trajectories.

Note that you can always run moatless tools yourself to get the code spans. You can also use other agents of your choice to identify the relevant code spans.

Ensembling Custom Predictions

Ensemble your own predictions by running the following command:

python swecomm.py
  --output_dir committee_out_1
  --submission_preds pred1.jsonl,pred2.jsonl,...
  --submission_results results1.json,results2.json,...

The --submission_results argument is optional. When it is provided, SWE committee only scores and reranks candidates for patches that are solved by at least one agent, so that you can save some time and money. Note that this optional argument does not give the committee any unfair advantage. You can still run reranking for all instances, and the results would be the same. More importantly, this relies on the assumption that the evaluation results are correct, which is not always the case. As can be seen in the difference in resolve rates between the SWE-Bench Leaderboard and the evaluations conducted by https://eval.moatless.ai/.

The `--submission_results` argument should be comma-separated lists of the paths to the predictions and results files, respectively. Each `results.json` needs to have a field named `resolved` or `resolved_ids` which contains the list of resolved instance ids.



where `pred1.jsonl`, `pred2.jsonl`, etc. are the predictions you want to ensemble and `results.jsonl` is the output file.

moatless-tools is the 323582b commit of the Moatless Tools repository. We use some of the utils from this repo.

Ensembling Leaderboard Submissions

resource/ contains the candidate patches and the code spans you can use to reproduce the results in the paper. They can be obtained elsewhere, but we provide them here for convenience.

resource/experiments/ contains all the submissions on the leaderboard of SWE-Bench Lite as of August 3, 2024. We also include CosineAI's submission, which was not merged into the leaderboard due to the new requirement for providing trajectories.

resource/20240623_moatless_claude-3.5-sonnet contains the trajectories of moatless tools, from which we extract the code spans identified by moatless tools. Most of the trajectories are from Moatless Tools + Claude 3.5 Sonnet, but for those that are missing in that setting, we use the trajectories from Moatless Tools + GPT-4o. Note that we only use the identified code spans, not the trajectories themselves. To create these from scratch, we refer you to moatless tools' search loop.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cache_example		cache_example
resource		resource
swecomm_runs		swecomm_runs
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
eval_aider.sh		eval_aider.sh
eval_swecomm_1.sh		eval_swecomm_1.sh
eval_swecomm_1_claude.sh		eval_swecomm_1_claude.sh
eval_swecomm_2.sh		eval_swecomm_2.sh
eval_swecomm_2_claude.sh		eval_swecomm_2_claude.sh
eval_swecomm_open.sh		eval_swecomm_open.sh
eval_swecomm_open_claude.sh		eval_swecomm_open_claude.sh
eval_swecomm_test.sh		eval_swecomm_test.sh
prepare_spans.py		prepare_spans.py
prompts.py		prompts.py
repo.py		repo.py
requirements.txt		requirements.txt
swecomm.py		swecomm.py
swecomm_out_to_allpreds.py		swecomm_out_to_allpreds.py
test_claude.py		test_claude.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Salesforce SWE Committee

Prepare Relevant Code Spans

Ensembling Custom Predictions

Ensembling Leaderboard Submissions

About

Releases

Packages

Languages

License

SalesforceAIResearch/swecomm

Folders and files

Latest commit

History

Repository files navigation

Salesforce SWE Committee

Prepare Relevant Code Spans

Ensembling Custom Predictions

Ensembling Leaderboard Submissions

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages