Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add swe-bench scoring #6

Open
rodion-m opened this issue Aug 13, 2024 · 1 comment
Open

Add swe-bench scoring #6

rodion-m opened this issue Aug 13, 2024 · 1 comment

Comments

@rodion-m
Copy link

Thanks for sharing very interesting approach! Don't you want to try its abilities in SWE bench?
They have simplified the evaluation, so it's not easier to get an evaluation result l.

@minhngh
Copy link
Collaborator

minhngh commented Aug 13, 2024

Hi rodion-m,

Thank you for your concern.
The main reason for the lack of SWE bench in our evaluation is simply that our work and SWE bench focus on different problems. In this work, we want to handle the software development task, generating a complete software based on the user's task (like ChatDev and MetaGPT), whereas SWE bench wants to test the ability of software evolution or maintenance (bug fixing or implementing new features), which is one stage of the software development process. In addition, though SWE bench is challenging, it does not fully align with Agile methodology because bug fixing or implementing new features tasks are usually done within one sprint in practice.
However, our method incorporates the bug-fixing phase in the pipeline, so it can incorporate advancements from other methods tailored to run on SWE bench. We did try to adapt some methods from SWE bench to test on our dataset, but results were bad as our work and methods on SWE bench are specifically designed for different problems.

We hope our answer can solve your question. If you have any other concerns, we are here to assist you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants