Skip to content

Releases: confident-ai/deepeval

Version v2.0

02 Dec 01:33
Compare
Choose a tag to compare

Here are the new features we're bringing to you in the latest release:
⚙️ Automated LLM red teaming, aka. vulnerability and security safety scanning. You can now scan for over 40+ vulnerabilities using 10+ SOTA attack enhancement techniques in <10 lines of python code.
🪄 Synthetic dataset generation with a highly customizable synthetic data generation pipeline to cover literally any use case.
🖼️ Multi-modal LLM evaluation - perfect for an image editing or text-image use cases.
💬 Conversational evaluation - perfect for evaluating LLM chatbots.
💥 More LLM system metrics: Prompt Alignment (to determine whether your LLM is able to follow instructions specified in your prompt template), Tool Correctness (for agents), and Json Correctness (to validate if LLM outputs conform to your desired schema)

Red teaming, safety testing, and improved synthesizer, conversational metrics, multi-modal metrics

31 Oct 23:01
Compare
Choose a tag to compare

In DeepEval 1.4.7, we're releasing:

Agentic Evaluation Metric, Custom Evaluation LLMs, and Async for Synthetic Data Generation

30 Jul 17:27
Compare
Choose a tag to compare

In DeepEval v0.21.74, we have:

Verbosity in Metrics, Hyperparameter Logging, Improved Synthetic Data Generation, Better Async Support

25 Jun 12:14
Compare
Choose a tag to compare

In DeepEval v0.21.62, we:

Synthetic Data, Caching, Benchmarks, and GEval improvement

31 Mar 18:30
Compare
Choose a tag to compare

For deepeval's latest release v0.21.15, we release:

Async Support for Prod

09 Mar 17:27
Compare
Choose a tag to compare

In deepeval v0.20.85:

Conversational Metrics and Synthetic Data Generation

04 Mar 18:04
Compare
Choose a tag to compare

In DeepEval's latest release, there is now:

Production Stability

25 Feb 11:18
Compare
Choose a tag to compare

For the newest release, deepeval now is now stable for production use:

  • reduced package size
  • separated functionality of pytest vs deepeval test run command
  • included coverage score for summarization
  • fix contextual precision node error
  • released docs for better transparency into metrics calculation
  • allows users to configure RAGAS metrics for custom embedding models: https://docs.confident-ai.com/docs/metrics-ragas#example
  • fixed bugs with checking for package updates

Hugging Face and LlamaIndex integration

14 Feb 06:05
Compare
Choose a tag to compare

For the latest release, DeepEval:

LLM-Evals now support all LangChain chatmodels

16 Jan 11:22
Compare
Choose a tag to compare
  • LLM-Evals (LLM evaluated metrics) now support all of langchain's chat models.
  • LLMTestCase now has execution_time and cost, useful for those looking to evaluate on these parameters
  • minimum_score is now threshold instead, meaning you can now create custom metrics that either have a "minimum" or "maximum" threshold
  • LLMEvalMetric is now GEval
  • Llamaindex Tracing integration: (https://docs.llamaindex.ai/en/stable/module_guides/observability/observability.html#deepeval)