v0.23.0
What's Changed
- Test: modify single model evaluation scores by @ashrafchowdury in #1981
- Test: upload testset as csv by @ashrafchowdury in #1979
- fix(backend): fix json evaluator description by @mmabrouk in #1990
- Test: save single model testset by @ashrafchowdury in #1980
- fix(tool): AGE-612 fix dependencies with langchain_community by @mmabrouk in #1992
- AGE 486 - SDK v3 - Configuration management with Pydantic by @mmabrouk in #1961
- feat(backend): AGE-276 and AGE-471 Improves LLM-as-a-judge reliability by @mmabrouk in #1938
- Bump versions by @github-actions in #1996
Full Changelog: v0.22.0...v0.23.0