Releases: langchain-ai/langchain-datastax
libs/astradb/v0.5.2
What's Changed
- Update AstraDBGraphVectorStore to match implementation of CassandraGraphVectorStore by @epinzur in #95
- Add Vectorize support to AstraDBGraphVectorStore by @epinzur in #98
- Create 'real' synchronous methods for GVS by @epinzur in #100
- Prepare for v0.5.2 by @hemidactylus in #101
- infra: disable pypi release attestations by @efriis in #102
New Contributors
Full Changelog: libs/astradb/v0.5.1...libs/astradb/v0.5.2
libs/astradb/v0.5.1
What's Changed
- Test Python 3.12 in CI by @cbornet in #79
- Improve lint setup by @cbornet in #78
- Remove some
noqa: ARG
rules escapes by @cbornet in #84 - Use astrapy 1.5+ naming conventions by @hemidactylus in #87
- Automatic environment from data api endpoint (if not supplied) by @hemidactylus in #88
- Add [a]delete_by_metadata and [a]update_metadata methods to vector store by @hemidactylus in #89
- tiny docstring improvement for (my OCD and) correctness by @hemidactylus in #90
- more docstring nits by @hemidactylus in #91
- Fix some ruff rules from preview mode by @cbornet in #93
- Improve GVS and VS pydoc by @cbornet in #94
- Customizable and detailed "Caller info" per-component by @hemidactylus in #96
- Bump to 0.5.1 for release by @hemidactylus in #97
Full Changelog: libs/astradb/v0.5.0...libs/astradb/v0.5.1
libs/astradb/v0.5.0
What's Changed
- Implement GraphVectorStore for AstraDB by @kerinin in #67
- astradb[minor]: update dependencies for compatibility with langchain-core 0.3 by @ccurme in #71
- Add AstraDBGraphVectorStore testing by @hemidactylus in #75
- Better handling of errors during insert many (Vector Store) by @hemidactylus in #76
- Thorough rewrite and optimization of integration tests by @hemidactylus in #82
New Contributors
Full Changelog: libs/astradb/v0.4.0...libs/astradb/v0.5.0
libs/astradb/v0.4.0
What's Changed
- Vector store, autodetect mode by @hemidactylus in #65
Full Changelog: libs/astradb/v0.3.5...libs/astradb/v0.4.0
Vector Store "autodetect mode". A short guide
Summary
The newly-introduced "autodetect" mode for the Astra DB Vector Store assumes
a collection exists already, possibly populated by external means with vector
documents of some (uniform) shape.
Upon initialization, the vector store class figures out the collection and
"schema" settings and works seamlessly on it.
Tested scenarios:
All the following have been tested to work with this init mode.
(See a section below for more extended details/instructions.)
- "native" non-vectorize store (i.e. the usual store as this class has always produced)
- "native" vectorize-based store (same)
- non-vectorize collection with an imported CSV (Astra UI) containing embedding vectors
- vectorize collection with an imported CSV (Astra UI) containing a "$vectorize" column
- vectorize collection with an imported CSV (Astra UI) one of whose columns is marked as text-to-embed
- vectorize collection with an ingested PDF file (Astra UI)
Note that the vector stores created through LangFlow fall in the first two cases,
if using the Astra DB Vector Store component
(e.g. when connecting a "Load from file" component to a Vector Store in LangFlow).
Usage tips
If you anticipate the collection to be populated by ingestion pipelines other than
the AstraDBVectorStore itself, you should assume that, when the latter component
initializes, the collection exists already. In this case, you can make use of the
autodetect mode.
Note: autodetect will fall back to "native" mode, also for back-compatibility, in these
cases: (1) empty collection, (2) collection populated with documents compliant with "native".
Typical usage
You can initialize the store with the autodetect_collection
parameter like this:
store = AstraDBVectorStore(
collection_name="my_collection",
token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
autodetect_collection=True,
)
Note that most other init parameters can also be passed (see below for details).
Additional parameters:
content_field
: specifies what root-level field in the documents carry the textual content. Cannot be passed for vectorize collection, where it is fixed to$vectorize
. For autodetect, it can also be passed as"*"
, meaning that the autodetect procedure should also guess it by looking at a handful of documents on the collection. For non-vectorize collections, defaults to"content"
.ignore_invalid_documents
: by default, malformed documents from the DB (e.g. missing metadata, missing textual content) trigger a runtime error. This flag sets a more permissive behaviour, by which bad documents are logged and ignored without compromising the working of the store. Keep in mind this is a post-filtering, so one may end up with a lower number of matches from a vector search.
(Note: both parameters can also be used outside of the autodetect mode -- except for "*"
to the first one. Probably there are not many cases where this is useful, but there's nothing blocking you from passing them.)
Forbidden parameters:
The following parameters, pertaining to how the collection should be created, are not permitted
on autodetect: metric
, setup_mode
, metadata_indexing_include
, metadata_indexing_exclude
, collection_indexing_policy
, collection_vector_service_options
.
Tested scenarios, details
Here is a minimal script to try a basic similarity search with an autodetected store:
import os
import logging
logging.basicConfig(level=5)
from langchain_astradb import AstraDBVectorStore
ad_store = AstraDBVectorStore(
collection_name=os.environ["AUTODETECT_COLLECTION_NAME"],
token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
autodetect_collection=True,
)
for doc, sco in ad_store.similarity_search_with_score(
"Tell me so and so",
k=2,
):
print(f"\n\n{'=' * 80}")
print(f"Score = {sco}")
print(f"Page content = {doc.page_content}")
The following outlines the preparation steps preliminary to running the above
in the various tested cases.
"native" non-vectorize store
Simply use a regular vector store with client-side embeddings and have it add_texts(...)
with a handful of entries.
Afterwards, try the autodetect script above on the same collection (but see the following Note).
Note 1: in this case, you need to create the correct embedding, e.g.:
from langchain_openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(
api_key=os.environ["OPENAI_API_KEY"],
model="text-embedding-ada-002",
)
and then pass it as an additional argument (embedding=embedding
) when creating ad_store
in the script.
(Note 2: in this case, content_field
should never be needed.)
"native" vectorize-based store
Use a regular vectorize-based vector store and have it add_texts(...)
with some entries.
Afterwards, try the autodetect script above on the same collection.
non-vectorize with imported CSV
You can create a Collection in the Astra UI with "Bring your own embedding", setting
the desired dimensionality, and then import a CSV with a "$vector"
field into it.
An example CSV is as follows (requires a dimension of 1536):
title,reviewid,creationdate,criticname,originalscore,reviewstate,reviewtext,embedding
Beavers,1145982,2003-05-23,Ivan M. Lincoln,3.5/4,fresh,"Timed to be just long enough for most youngsters' brief attention spans -- and it's packed with plenty of interesting activity, both on land and under the water.","[-0.007622489705681801, 0.005060779396444559, 0.02063874900341034, -0.007668646518141031, -0.029461318626999855, 0.013121760450303555, -0.02066512405872345, -0.028063422068953514, -0.013530578464269638, -0.019385917112231255, -0.0017160492716357112, 0.038217950612306595, -0.0042167664505541325, -0.009692958556115627, 0.011486485600471497, -0.00171934615354985, 0.025531385093927383, -0.016128554940223694, 0.004793728236109018, -0.0242785532027483, -0.00812362227588892, 0.010451250709593296, 0.006785070989280939, -0.007695022039115429, -0.011202950030565262, 0.003709040116518736, 0.006274047773331404, -0.008407157845795155, 0.005575099494308233, -0.004233251325786114, 0.02990970015525818, -0.011156792752444744, 0.005911386106163263, -0.013781145215034485, 0.001370696467347443, 0.002546874340623617, 0.002978771459311247, 0.0009676473564468324, 0.009785272181034088, 0.004229954443871975, -0.001103645539842546, 0.006640006322413683, -0.0030611944384872913, -0.013530578464269638, 0.001376466010697186, -0.021430009976029396, 0.007879650220274925, -0.00583225954324007, -0.01449328102171421, 0.004183797165751457, 0.005904791876673698, -0.01583842560648918, 0.005782805848866701, -0.032098859548568726, -0.006171843037009239, -0.012198621407151222, -0.001701213070191443, -0.016563748940825462, 0.015996677801012993, -0.015416420064866543, 0.00233751954510808, -0.0015882934676483274, -0.0015141126932576299, 0.008815976791083813, -0.003646398661658168, -0.003705743234604597, 0.0020754141733050346, 0.0054597072303295135, 0.01279866136610508, -0.005674007348716259, 0.015363669022917747, 0.002830409910529852, 0.03217798471450806, 0.004286001902073622, 0.028142549097537994, -0.010022650472819805, -0.007760960608720779, 0.00789283774793148, 0.014849348925054073, 0.02203664369881153, 0.011288669891655445, -0.032046105712652206, -0.02753591537475586, 0.026718277484178543, 0.011202950030565262, 0.013939397409558296, 0.0020325540099292994, 0.014559219591319561, -0.011268888600170612, -0.001387180993333459, -0.0011539235711097717, 0.01931997761130333, 0.017262697219848633, 0.017856143414974213, 0.002866676077246666, -0.0018627623794600368, -0.01368883065879345, 0.02877555787563324, -0.025795137509703636, -0.035659536719322205, -0.0026523759588599205, 0.012198621407151222, 0.023421352729201317, -0.005031106993556023, -0.023368602618575096, -0.016695626080036163, 0.015521921217441559, -0.0031930715776979923, 0.033233001828193665, 0.010451250709593296, -0.030331706628203392, 0.02645452320575714, 0.006649896968156099, -0.03500015288591385, -0.0218256413936615, 0.008611567318439484, 0.0068773846141994, 0.007325766608119011, 0.016286807134747505, -0.007695022039115429, 0.020902501419186592, 0.0068773846141994, 0.0017490185564383864, -0.01345804613083601, 0.00196331855840981, 0.011189762502908707, -0.01616811752319336, -0.011618362739682198, 0.00902698002755642, -0.016550561413168907, 0.012614034116268158, -0.004454145208001137, 0.0029639352578669786, -0.002774361986666918, -0.04465354606509209, 0.0008196978596970439, -0.010358937084674835, -0.011479891836643219, -0.02691609226167202, -0.006745507940649986, 0.02050687186419964, 0.029144814237952232, -0.02268284186720848, -0.015521921217441559, 0.002256745006889105, -0.00013723448500968516, -0.00580588448792696, 0.01190849207341671, -0.011394171975553036, 0.012699753977358341, 0.019254039973020554, -0.01831771247088909, 0.005284970160573721, 0.028168924152851105, -0.011473298072814941, 0.024067549034953117, -0.0012000806163996458, 0.010167715139687061, -0.02640177309513092, -0.012666784226894379, 0.0125283133238554, 0.006396033801138401, 0.032072484493255615, -0.016629688441753387, 0.033180247992277145, 0.01738138683140278, 0.019768361002206802, -0.0033282453659921885, 0.03368138149380684, 0.00030084437457844615, -0.010899633169174194, 0.018436402082443237, -0.03273186832666397, 0.011743645183742046, -0.013029445894062519, 0.013016258366405964, 0.014915287494659424, 0.005627850536257029, 0....
libs/astradb/v0.3.5
What's Changed
- All regions have openai vectorize + fix async chat memory test by @hemidactylus in #37
- Add extensive note about "the indexing warning" in README by @hemidactylus in #38
- Refactor MMR flow so as to enable it with vectorize stores by @hemidactylus in #40
- release permissions by @efriis in #39
- [Note: this implies full support for HCD/DSE/...] Full porting to astrapy 1.3+ by @hemidactylus in #42
- standardize doc string by @isahers1 in #45
- Bump ruff and update Makefile by @cbornet in #46
- Use future annotations by @cbornet in #47
- Lint mmr.py by @hemidactylus in #50
- Fix docstrings with ruff D rule by @cbornet in #48
- Ruff auto-fixes by @cbornet in #49
- Fix more ruff checks by @cbornet in #51
- Activate ALL Ruff rules with some exclusions by @cbornet in #53
- Vector store, refactor encoding to an Astra document by @hemidactylus in #52
- Activate ruff PERF rule by @cbornet in #54
- Activate ruff FBT rule by @cbornet in #55
- Activate ruff doc(D) rules by @cbornet in #56
- Refactor similarity_search methods + forbid by-vector if vectorize by @hemidactylus in #57
- Activate ruff ANN rules by @cbornet in #58
- Bump ruff version to 0.6 by @cbornet in #59
- Activate PTH rules by @cbornet in #60
- Activate ruff rules BLE by @cbornet in #61
- Activate rules PT011 and PT012 by @cbornet in #62
- Activate ruff rule C90 by @cbornet in #63
- Activate ruff rule D417 by @cbornet in #64
- Some fixes to AstraDBVectorStore pydoc by @cbornet in #66
- astradb[patch]: prep for langchain-core 0.3 by @baskaryan in #68
- Trying to make CI robust against pytest.PytestUnraisableExceptionWarning by @hemidactylus in #70
New Contributors
- @isahers1 made their first contribution in #45
- @baskaryan made their first contribution in #68
Full Changelog: libs/astradb/v0.3.3...libs/astradb/v0.3.5
libs/astradb/v0.3.3
What's Changed
- Upgrade vectorize support to Astra's Public Preview available service by @hemidactylus in #33
- Prepare to release full vectorize support (Bump to 0.3.3) by @hemidactylus in #35
Full Changelog: libs/astradb/v0.3.2...libs/astradb/v0.3.3
libs/astradb/v0.3.2
What's Changed
- Environment variables for Astra DB init by @hemidactylus in #30
- bump 0.3.2 by @hemidactylus in #31
Full Changelog: libs/astradb/v0.3.1...libs/astradb/v0.3.2
libs/astradb/v0.3.1
What's Changed
Full Changelog: libs/astradb/v0.3.0...libs/astradb/v0.3.1
libs/astradb/v0.3.0
What's Changed
- feat: add support in vector_store for server-side embeddings by @jordanrfrazier in #22
- Update version to 0.3.0 by @jordanrfrazier in #27
- DB reads make their projection(+similarity) explicit where needed by @hemidactylus in #20
Full Changelog: libs/astradb/v0.2.0...libs/astradb/v0.3.0
libs/astradb/v0.2.0
What's Changed
- Run integration tests in CI by @cbornet in #9
- Improve gitignore by @cbornet in #12
- Update README by @cbornet in #13
- add filter to document_loaders test to avoid randomness flakiness by @hemidactylus in #15
- Relax astrapy dependency requirement to accept v 1.* by @hemidactylus in #17
- Replace score function in AstraDBVectorStore with simpler lambda by @cbornet in #16
- ci: add master core install for lint/test/dev by @efriis in #14
- set astrapy version to ^1 by @jordanrfrazier in #21
- bump version to 0.2.0 by @nicoloboschi in #25
- ci: skip integration tests during release process by @nicoloboschi in #26
New Contributors
- @jordanrfrazier made their first contribution in #21
- @nicoloboschi made their first contribution in #25
Full Changelog: libs/astradb/v0.1.0...libs/astradb/v0.2.0