You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Interesting read from the Hugging Face team -- they just set the new SotA on the GAIA benchmark largely by using Code Actions. They beat Microsoft's Autogen, which is no small feat.
Sure, GAIA is focused on reasoning, and the idea of introducing an agent that writes code but not for the output seems a little turtles-all-the-way-down. Still, there could be something to gain in terms of planning.
If I understood the Code Action concept correctly the essential thing is to let the LLM respond in code instead of a structured format like JSON? It could be interesting then to try this out in the search-identify steps. Like instructing the LLM to write code that should help it find the right parts of the code base. Like provide a set of functions it can use and a clear way in when to provide more information. For example when it writes a print() statement to see the output or just instruct it as it’s using a Jupyter notebook like code/open interpreter . A first step would be to just provide the current search functions, the identify function and a way for it to show that it found the relevant code. Then we can see if it gets creative and uses other ways to find or filter out the right code.
Interesting read from the Hugging Face team -- they just set the new SotA on the GAIA benchmark largely by using Code Actions. They beat Microsoft's Autogen, which is no small feat.
Sure, GAIA is focused on reasoning, and the idea of introducing an agent that writes code but not for the output seems a little turtles-all-the-way-down. Still, there could be something to gain in terms of planning.
Not sure, only a thought.
Blog: https://huggingface.co/blog/beating-gaia
Code: https://github.com/aymeric-roucher/GAIA
The text was updated successfully, but these errors were encountered: