Skip to content

Commit

Permalink
feat: Update astradb integration for latest client library (#1145)
Browse files Browse the repository at this point in the history
* Update astradb integration for latest client library

* Update CHANGELOG.md

* Ruff check update

* Black linting updates

* Tweak to versioning for astrapy

* removing CHANGELOG.MD changes since those are automatically added

---------

Co-authored-by: David S. Batista <[email protected]>
  • Loading branch information
erichare and davidsbatista authored Oct 22, 2024
1 parent 61ac2f4 commit 067adba
Show file tree
Hide file tree
Showing 5 changed files with 109 additions and 123 deletions.
24 changes: 14 additions & 10 deletions integrations/astra/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,18 @@

```bash
pip install astra-haystack

```

### Local Development

install astra-haystack package locally to run integration tests:

Open in gitpod:
[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/Anant/astra-haystack/tree/main)

Switch Python version to 3.9 (Requires 3.8+ but not 3.12)
```
Switch Python version to 3.9 (Requires 3.9+ but not 3.12)

```bash
pyenv install 3.9
pyenv local 3.9
```
Expand All @@ -33,7 +34,8 @@ Install requirements
`pip install -r requirements.txt`

Export environment variables
```

```bash
export ASTRA_DB_API_ENDPOINT="https://<id>-<region>.apps.astra.datastax.com"
export ASTRA_DB_APPLICATION_TOKEN="AstraCS:..."
export COLLECTION_NAME="my_collection"
Expand All @@ -49,22 +51,25 @@ or

This package includes Astra Document Store and Astra Embedding Retriever classes that integrate with Haystack, allowing you to easily perform document retrieval or RAG with Astra, and include those functions in Haystack pipelines.

### In order to use the Document Store directly:
### Use the Document Store Directly

Import the Document Store:
```

```python
from haystack_integrations.document_stores.astra import AstraDocumentStore
from haystack.document_stores.types.policy import DuplicatePolicy
```

Load in environment variables:
```

```python
namespace = os.environ.get("ASTRA_DB_KEYSPACE")
collection_name = os.environ.get("COLLECTION_NAME", "haystack_vector_search")
```

Create the Document Store object (API Endpoint and Token are read off the environment):
```

```python
document_store = AstraDocumentStore(
collection_name=collection_name,
namespace=namespace,
Expand All @@ -80,7 +85,7 @@ Then you can use the document store functions like count_document below:

Create the Document Store object like above, then import and create the Pipeline:

```
```python
from haystack import Pipeline
pipeline = Pipeline()
```
Expand All @@ -101,7 +106,6 @@ or,

> Astra DB collection '...' is detected as having the following indexing policy: {...}. This does not match the requested indexing policy for this object: {...}. In particular, there may be stricter limitations on the amount of text each string in a document can store. Consider indexing anew on a fresh collection to be able to store longer texts.

The reason for the warning is that the requested collection already exists on the database, and it is configured to [index all of its fields for search](https://docs.datastax.com/en/astra-db-serverless/api-reference/collections.html#the-indexing-option), possibly implicitly, by default. When the Haystack object tries to create it, it attempts to enforce, instead, an indexing policy tailored to the prospected usage: this is both to enable storing very long texts and to avoid indexing fields that will never be used in filtering a search (indexing those would also have a slight performance cost for writes).

Typically there are two reasons why you may encounter the warning:
Expand Down
2 changes: 1 addition & 1 deletion integrations/astra/examples/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
haystack-ai
sentence_transformers==2.2.2
openai==1.6.1
astrapy>=0.7.7
astrapy>=1.5.0,<2.0
7 changes: 3 additions & 4 deletions integrations/astra/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,21 @@ name = "astra-haystack"
dynamic = ["version"]
description = ''
readme = "README.md"
requires-python = ">=3.8"
requires-python = ">=3.9"
license = "Apache-2.0"
keywords = []
authors = [{ name = "Anant Corporation", email = "[email protected]" }]
classifiers = [
"License :: OSI Approved :: Apache Software License",
"Development Status :: 4 - Beta",
"Programming Language :: Python",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
]
dependencies = ["haystack-ai", "pydantic", "typing_extensions", "astrapy"]
dependencies = ["haystack-ai", "pydantic", "typing_extensions", "astrapy>=1.5.0,<2.0"]

[project.urls]
Documentation = "https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/astra#readme"
Expand Down Expand Up @@ -57,7 +56,7 @@ cov = ["test-cov", "cov-report"]
cov-retry = ["test-cov-retry", "cov-report"]
docs = ["pydoc-markdown pydoc/config.yml"]
[[tool.hatch.envs.all.matrix]]
python = ["3.8", "3.9", "3.10", "3.11"]
python = ["3.9", "3.10", "3.11"]

[tool.hatch.envs.lint]
installer = "uv"
Expand Down
Loading

0 comments on commit 067adba

Please sign in to comment.