Skip to content

Commit

Permalink
Merge branch 'main' into prompt_caching
Browse files Browse the repository at this point in the history
  • Loading branch information
vblagoje authored Sep 17, 2024
2 parents 7c5e16b + b32f620 commit 68d410a
Show file tree
Hide file tree
Showing 68 changed files with 2,382 additions and 886 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/google_vertex.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
python-version: ["3.9", "3.10"]
python-version: ["3.9", "3.10", "3.11", "3.12"]

steps:
- name: Support longpaths
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/weaviate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ jobs:
run: pip install --upgrade hatch

- name: Lint
if: matrix.python-version == '3.9' && runner.os == 'Linux'
run: hatch run lint:all

- name: Run Weaviate container
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Please check out our [Contribution Guidelines](CONTRIBUTING.md) for all the deta
| [mistral-haystack](integrations/mistral/) | Embedder, Generator | [![PyPI - Version](https://img.shields.io/pypi/v/mistral-haystack.svg)](https://pypi.org/project/mistral-haystack) | [![Test / mistral](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/mistral.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/mistral.yml) |
| [mongodb-atlas-haystack](integrations/mongodb_atlas/) | Document Store | [![PyPI - Version](https://img.shields.io/pypi/v/mongodb-atlas-haystack.svg?color=orange)](https://pypi.org/project/mongodb-atlas-haystack) | [![Test / mongodb-atlas](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/mongodb_atlas.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/mongodb_atlas.yml) |
| [nvidia-haystack](integrations/nvidia/) | Generator | [![PyPI - Version](https://img.shields.io/pypi/v/nvidia-haystack.svg?color=orange)](https://pypi.org/project/nvidia-haystack) | [![Test / nvidia](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/nvidia.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/nvidia.yml) |
| [ollama-haystack](integrations/ollama/) | Generator | [![PyPI - Version](https://img.shields.io/pypi/v/ollama-haystack.svg?color=orange)](https://pypi.org/project/ollama-haystack) | [![Test / ollama](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/ollama.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/ollama.yml) |
| [ollama-haystack](integrations/ollama/) | Embedder, Generator | [![PyPI - Version](https://img.shields.io/pypi/v/ollama-haystack.svg?color=orange)](https://pypi.org/project/ollama-haystack) | [![Test / ollama](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/ollama.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/ollama.yml) |
| [opensearch-haystack](integrations/opensearch/) | Document Store | [![PyPI - Version](https://img.shields.io/pypi/v/opensearch-haystack.svg)](https://pypi.org/project/opensearch-haystack) | [![Test / opensearch](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/opensearch.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/opensearch.yml) |
| [optimum-haystack](integrations/optimum/) | Embedder | [![PyPI - Version](https://img.shields.io/pypi/v/optimum-haystack.svg)](https://pypi.org/project/optimum-haystack) | [![Test / optimum](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/optimum.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/optimum.yml) |
| [pinecone-haystack](integrations/pinecone/) | Document Store | [![PyPI - Version](https://img.shields.io/pypi/v/pinecone-haystack.svg?color=orange)](https://pypi.org/project/pinecone-haystack) | [![Test / pinecone](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/pinecone.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/pinecone.yml) |
Expand Down
11 changes: 11 additions & 0 deletions integrations/astra/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Changelog

## [integrations/astra-v0.9.3] - 2024-09-12

### 🐛 Bug Fixes

- Astra DB, improved warnings and guidance about indexing-related mismatches (#932)
- AstraDocumentStore filter by id (#1053)

### 🧪 Testing

- Do not retry tests in `hatch run test` command (#954)

## [integrations/astra-v0.9.2] - 2024-07-22

## [integrations/astra-v0.9.1] - 2024-07-15
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@ def _convert_filters(filters: Optional[Dict[str, Any]] = None) -> Optional[Dict[
if key in {"$and", "$or"}:
filter_statements[key] = value
else:
if key == "id":
filter_statements[key] = {"_id": value}
if key != "$in" and isinstance(value, list):
filter_statements[key] = {"$in": value}
elif isinstance(value, pd.DataFrame):
Expand All @@ -45,6 +43,8 @@ def _convert_filters(filters: Optional[Dict[str, Any]] = None) -> Optional[Dict[
filter_statements[key] = converted
else:
filter_statements[key] = value
if key == "id":
filter_statements["_id"] = filter_statements.pop("id")

return filter_statements

Expand Down
6 changes: 6 additions & 0 deletions integrations/astra/tests/test_document_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,12 @@ def test_filter_documents_nested_filters(self, document_store, filterable_docs):
],
)

def test_filter_documents_by_id(self, document_store):
docs = [Document(id="1", content="test doc 1"), Document(id="2", content="test doc 2")]
document_store.write_documents(docs)
result = document_store.filter_documents(filters={"field": "id", "operator": "==", "value": "1"})
self.assert_documents_are_equal(result, [docs[0]])

@pytest.mark.skip(reason="Unsupported filter operator not.")
def test_not_operator(self, document_store, filterable_docs):
pass
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from typing import Any, Dict, List, Literal, Optional, Tuple

import chromadb
import numpy as np
from chromadb.api.types import GetResult, QueryResult, validate_where, validate_where_document
from haystack import default_from_dict, default_to_dict
from haystack.dataclasses import Document
Expand Down Expand Up @@ -453,7 +452,7 @@ def _query_result_to_documents(result: QueryResult) -> List[List[Document]]:
for j in range(len(answers)):
document_dict: Dict[str, Any] = {
"id": result["ids"][i][j],
"content": documents[i][j],
"content": answers[j],
}

# prepare metadata
Expand All @@ -465,7 +464,7 @@ def _query_result_to_documents(result: QueryResult) -> List[List[Document]]:
pass

if embeddings := result.get("embeddings"):
document_dict["embedding"] = np.array(embeddings[i][j])
document_dict["embedding"] = embeddings[i][j]

if distances := result.get("distances"):
document_dict["score"] = distances[i][j]
Expand Down
7 changes: 6 additions & 1 deletion integrations/chroma/tests/test_document_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,12 @@ def test_search(self):

# Assertions to verify correctness
assert len(result) == 1
assert result[0][0].content == "Third document"
doc = result[0][0]
assert doc.content == "Third document"
assert doc.meta == {"author": "Author2"}
assert doc.embedding
assert isinstance(doc.embedding, list)
assert all(isinstance(el, float) for el in doc.embedding)

def test_write_documents_unsupported_meta_values(self, document_store: ChromaDocumentStore):
"""
Expand Down
17 changes: 16 additions & 1 deletion integrations/cohere/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,30 @@
# Changelog

## [unreleased]
## [integrations/cohere-v2.0.0] - 2024-09-16

### 🚀 Features

- Update Anthropic/Cohere for tools use (#790)
- Update Cohere default LLMs, add examples and update unit tests (#838)
- Cohere LLM - adjust token counting meta to match OpenAI format (#1086)

### 🐛 Bug Fixes

- Lints in `cohere-haystack` (#995)

### 🧪 Testing

- Do not retry tests in `hatch run test` command (#954)

### ⚙️ Miscellaneous Tasks

- Retry tests to reduce flakyness (#836)
- Update ruff invocation to include check parameter (#853)

### Docs

- Update CohereChatGenerator docstrings (#958)
- Update CohereGenerator docstrings (#960)

## [integrations/cohere-v1.1.1] - 2024-06-12

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ def run(self, messages: List[ChatMessage], generation_kwargs: Optional[Dict[str,
if finish_response.meta.billed_units:
tokens_in = finish_response.meta.billed_units.input_tokens or -1
tokens_out = finish_response.meta.billed_units.output_tokens or -1
chat_message.meta["usage"] = tokens_in + tokens_out
chat_message.meta["usage"] = {"prompt_tokens": tokens_in, "completion_tokens": tokens_out}
chat_message.meta.update(
{
"model": self.model,
Expand Down Expand Up @@ -220,11 +220,13 @@ def _build_message(self, cohere_response):
message = ChatMessage.from_assistant(cohere_response.tool_calls[0].json())
elif cohere_response.text:
message = ChatMessage.from_assistant(content=cohere_response.text)
total_tokens = cohere_response.meta.billed_units.input_tokens + cohere_response.meta.billed_units.output_tokens
message.meta.update(
{
"model": self.model,
"usage": total_tokens,
"usage": {
"prompt_tokens": cohere_response.meta.billed_units.input_tokens,
"completion_tokens": cohere_response.meta.billed_units.output_tokens,
},
"index": 0,
"finish_reason": cohere_response.finish_reason,
"documents": cohere_response.documents,
Expand Down
7 changes: 7 additions & 0 deletions integrations/cohere/tests/test_cohere_chat_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,9 @@ def test_live_run(self):
assert len(results["replies"]) == 1
message: ChatMessage = results["replies"][0]
assert "Paris" in message.content
assert "usage" in message.meta
assert "prompt_tokens" in message.meta["usage"]
assert "completion_tokens" in message.meta["usage"]

@pytest.mark.skipif(
not os.environ.get("COHERE_API_KEY", None) and not os.environ.get("CO_API_KEY", None),
Expand Down Expand Up @@ -210,6 +213,10 @@ def __call__(self, chunk: StreamingChunk) -> None:
assert callback.counter > 1
assert "Paris" in callback.responses

assert "usage" in message.meta
assert "prompt_tokens" in message.meta["usage"]
assert "completion_tokens" in message.meta["usage"]

@pytest.mark.skipif(
not os.environ.get("COHERE_API_KEY", None) and not os.environ.get("CO_API_KEY", None),
reason="Export an env var called COHERE_API_KEY/CO_API_KEY containing the Cohere API key to run this test.",
Expand Down
7 changes: 6 additions & 1 deletion integrations/elasticsearch/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Changelog

## [unreleased]
## [integrations/elasticsearch-v1.0.0] - 2024-09-12

### 🚀 Features

Expand All @@ -11,10 +11,15 @@

- `ElasticSearch` - Fallback to default filter policy when deserializing retrievers without the init parameter (#898)

### 🧪 Testing

- Do not retry tests in `hatch run test` command (#954)

### ⚙️ Miscellaneous Tasks

- Retry tests to reduce flakyness (#836)
- Update ruff invocation to include check parameter (#853)
- ElasticSearch - remove legacy filters elasticsearch (#1078)

## [integrations/elasticsearch-v0.5.0] - 2024-05-24

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
from haystack.dataclasses import Document
from haystack.document_stores.errors import DocumentStoreError, DuplicateDocumentError
from haystack.document_stores.types import DuplicatePolicy
from haystack.utils.filters import convert
from haystack.version import __version__ as haystack_version

from elasticsearch import Elasticsearch, helpers # type: ignore[import-not-found]
Expand Down Expand Up @@ -224,7 +223,8 @@ def filter_documents(self, filters: Optional[Dict[str, Any]] = None) -> List[Doc
:returns: List of `Document`s that match the filters.
"""
if filters and "operator" not in filters and "conditions" not in filters:
filters = convert(filters)
msg = "Invalid filter syntax. See https://docs.haystack.deepset.ai/docs/metadata-filtering for details."
raise ValueError(msg)

query = {"bool": {"filter": _normalize_filters(filters)}} if filters else None
documents = self._search_documents(query=query)
Expand Down
58 changes: 58 additions & 0 deletions integrations/google_vertex/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Changelog

## [unreleased]

### 🚀 Features

- Enable streaming for VertexAIGeminiChatGenerator (#1014)
- Add tests for VertexAIGeminiGenerator and enable streaming (#1012)

### 🐛 Bug Fixes

- Remove the use of deprecated gemini models (#1032)
- Chat roles for model responses in chat generators (#1030)

### 🧪 Testing

- Do not retry tests in `hatch run test` command (#954)
- Add tests for VertexAIChatGeminiGenerator and migrate from preview package in vertexai (#1042)

### ⚙️ Miscellaneous Tasks

- Retry tests to reduce flakyness (#836)
- Update ruff invocation to include check parameter (#853)

## [integrations/google_vertex-v1.1.0] - 2024-03-28

## [integrations/google_vertex-v1.0.0] - 2024-03-27

### 🐛 Bug Fixes

- Fix order of API docs (#447)

This PR will also push the docs to Readme

### 📚 Documentation

- Update category slug (#442)
- Review google vertex integration (#535)
- Small consistency improvements (#536)
- Disable-class-def (#556)

### Google_vertex

- Create api docs (#355)

## [integrations/google_vertex-v0.2.0] - 2024-01-26

## [integrations/google_vertex-v0.1.0] - 2024-01-03

### 🐛 Bug Fixes

- The default model of VertexAIImagegenerator (#158)

### ⚙️ Miscellaneous Tasks

- Replace - with _ (#114)

<!-- generated by git-cliff -->
7 changes: 1 addition & 6 deletions integrations/google_vertex/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,7 @@ classifiers = [
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
]
dependencies = [
"haystack-ai",
"google-cloud-aiplatform>=1.38",
"pyarrow>3",
"protobuf<5.28",
]
dependencies = ["haystack-ai", "google-cloud-aiplatform>=1.38", "pyarrow>3"]

[project.urls]
Documentation = "https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/google_vertex#readme"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from haystack.dataclasses.chat_message import ChatMessage, ChatRole
from haystack.utils import deserialize_callable, serialize_callable
from vertexai import init as vertexai_init
from vertexai.preview.generative_models import (
from vertexai.generative_models import (
Content,
GenerationConfig,
GenerationResponse,
Expand Down Expand Up @@ -67,14 +67,14 @@ def __init__(
:param location: The default location to use when making API calls, if not set uses us-central-1.
Defaults to None.
:param generation_config: Configuration for the generation process.
See the [GenerationConfig documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.preview.generative_models.GenerationConfig
See the [GenerationConfig documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.generative_models.GenerationConfig
for a list of supported arguments.
:param safety_settings: Safety settings to use when generating content. See the documentation
for [HarmBlockThreshold](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.preview.generative_models.HarmBlockThreshold)
and [HarmCategory](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.preview.generative_models.HarmCategory)
for [HarmBlockThreshold](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.generative_models.HarmBlockThreshold)
and [HarmCategory](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.generative_models.HarmCategory)
for more details.
:param tools: List of tools to use when generating content. See the documentation for
[Tool](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.preview.generative_models.Tool)
[Tool](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.generative_models.Tool)
the list of supported arguments.
:param streaming_callback: A callback function that is called when a new token is received from
the stream. The callback function accepts StreamingChunk as an argument.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def __init__(
:param model: Name of the model to use. For available models, see https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models.
:param location: The default location to use when making API calls, if not set uses us-central-1.
:param generation_config: The generation config to use.
Can either be a [`GenerationConfig`](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.preview.generative_models.GenerationConfig)
Can either be a [`GenerationConfig`](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.generative_models.GenerationConfig)
object or a dictionary of parameters.
Accepted fields are:
- temperature
Expand All @@ -80,11 +80,11 @@ def __init__(
- max_output_tokens
- stop_sequences
:param safety_settings: The safety settings to use. See the documentation
for [HarmBlockThreshold](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.preview.generative_models.HarmBlockThreshold)
and [HarmCategory](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.preview.generative_models.HarmCategory)
for [HarmBlockThreshold](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.generative_models.HarmBlockThreshold)
and [HarmCategory](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.generative_models.HarmCategory)
for more details.
:param tools: List of tools to use when generating content. See the documentation for
[Tool](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.preview.generative_models.Tool)
[Tool](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.generative_models.Tool)
the list of supported arguments.
:param streaming_callback: A callback function that is called when a new token is received from the stream.
The callback function accepts StreamingChunk as an argument.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from haystack.core.component import component
from haystack.core.serialization import default_from_dict, default_to_dict
from haystack.dataclasses.byte_stream import ByteStream
from vertexai.preview.vision_models import ImageGenerationModel
from vertexai.vision_models import ImageGenerationModel

logger = logging.getLogger(__name__)

Expand Down
Loading

0 comments on commit 68d410a

Please sign in to comment.