Add swe-bench scoring #6

rodion-m · 2024-08-13T05:44:52Z

Thanks for sharing very interesting approach! Don't you want to try its abilities in SWE bench?
They have simplified the evaluation, so it's not easier to get an evaluation result l.

minhngh · 2024-08-13T06:50:47Z

Hi rodion-m,

Thank you for your concern.
The main reason for the lack of SWE bench in our evaluation is simply that our work and SWE bench focus on different problems. In this work, we want to handle the software development task, generating a complete software based on the user's task (like ChatDev and MetaGPT), whereas SWE bench wants to test the ability of software evolution or maintenance (bug fixing or implementing new features), which is one stage of the software development process. In addition, though SWE bench is challenging, it does not fully align with Agile methodology because bug fixing or implementing new features tasks are usually done within one sprint in practice.
However, our method incorporates the bug-fixing phase in the pipeline, so it can incorporate advancements from other methods tailored to run on SWE bench. We did try to adapt some methods from SWE bench to test on our dataset, but results were bad as our work and methods on SWE bench are specifically designed for different problems.

We hope our answer can solve your question. If you have any other concerns, we are here to assist you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add swe-bench scoring #6

Add swe-bench scoring #6

rodion-m commented Aug 13, 2024

minhngh commented Aug 13, 2024

Add swe-bench scoring #6

Add swe-bench scoring #6

Comments

rodion-m commented Aug 13, 2024

minhngh commented Aug 13, 2024