Skip to content

Releases: langchain-ai/langchain-datastax

libs/astradb/v0.5.2

01 Nov 17:18
05ffed9
Compare
Choose a tag to compare

What's Changed

  • Update AstraDBGraphVectorStore to match implementation of CassandraGraphVectorStore by @epinzur in #95
  • Add Vectorize support to AstraDBGraphVectorStore by @epinzur in #98
  • Create 'real' synchronous methods for GVS by @epinzur in #100
  • Prepare for v0.5.2 by @hemidactylus in #101
  • infra: disable pypi release attestations by @efriis in #102

New Contributors

Full Changelog: libs/astradb/v0.5.1...libs/astradb/v0.5.2

libs/astradb/v0.5.1

11 Oct 17:58
31c25d4
Compare
Choose a tag to compare

What's Changed

Full Changelog: libs/astradb/v0.5.0...libs/astradb/v0.5.1

libs/astradb/v0.5.0

27 Sep 15:44
5f0a566
Compare
Choose a tag to compare

What's Changed

  • Implement GraphVectorStore for AstraDB by @kerinin in #67
  • astradb[minor]: update dependencies for compatibility with langchain-core 0.3 by @ccurme in #71
  • Add AstraDBGraphVectorStore testing by @hemidactylus in #75
  • Better handling of errors during insert many (Vector Store) by @hemidactylus in #76
  • Thorough rewrite and optimization of integration tests by @hemidactylus in #82

New Contributors

Full Changelog: libs/astradb/v0.4.0...libs/astradb/v0.5.0

libs/astradb/v0.4.0

09 Sep 22:23
2e66453
Compare
Choose a tag to compare

What's Changed

Full Changelog: libs/astradb/v0.3.5...libs/astradb/v0.4.0

Vector Store "autodetect mode". A short guide

Summary

The newly-introduced "autodetect" mode for the Astra DB Vector Store assumes
a collection exists already, possibly populated by external means with vector
documents of some (uniform) shape.

Upon initialization, the vector store class figures out the collection and
"schema" settings and works seamlessly on it.

Tested scenarios:

All the following have been tested to work with this init mode.
(See a section below for more extended details/instructions.)

  • "native" non-vectorize store (i.e. the usual store as this class has always produced)
  • "native" vectorize-based store (same)
  • non-vectorize collection with an imported CSV (Astra UI) containing embedding vectors
  • vectorize collection with an imported CSV (Astra UI) containing a "$vectorize" column
  • vectorize collection with an imported CSV (Astra UI) one of whose columns is marked as text-to-embed
  • vectorize collection with an ingested PDF file (Astra UI)

Note that the vector stores created through LangFlow fall in the first two cases,
if using the Astra DB Vector Store component
(e.g. when connecting a "Load from file" component to a Vector Store in LangFlow).

Usage tips

If you anticipate the collection to be populated by ingestion pipelines other than
the AstraDBVectorStore itself, you should assume that, when the latter component
initializes, the collection exists already. In this case, you can make use of the
autodetect mode.

Note: autodetect will fall back to "native" mode, also for back-compatibility, in these
cases: (1) empty collection, (2) collection populated with documents compliant with "native".

Typical usage

You can initialize the store with the autodetect_collection parameter like this:

store = AstraDBVectorStore(
    collection_name="my_collection",
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
    autodetect_collection=True,
)

Note that most other init parameters can also be passed (see below for details).

Additional parameters:

  • content_field: specifies what root-level field in the documents carry the textual content. Cannot be passed for vectorize collection, where it is fixed to $vectorize. For autodetect, it can also be passed as "*", meaning that the autodetect procedure should also guess it by looking at a handful of documents on the collection. For non-vectorize collections, defaults to "content".
  • ignore_invalid_documents: by default, malformed documents from the DB (e.g. missing metadata, missing textual content) trigger a runtime error. This flag sets a more permissive behaviour, by which bad documents are logged and ignored without compromising the working of the store. Keep in mind this is a post-filtering, so one may end up with a lower number of matches from a vector search.

(Note: both parameters can also be used outside of the autodetect mode -- except for "*" to the first one. Probably there are not many cases where this is useful, but there's nothing blocking you from passing them.)

Forbidden parameters:

The following parameters, pertaining to how the collection should be created, are not permitted
on autodetect: metric, setup_mode, metadata_indexing_include, metadata_indexing_exclude, collection_indexing_policy, collection_vector_service_options.

Tested scenarios, details

Here is a minimal script to try a basic similarity search with an autodetected store:

import os
import logging
logging.basicConfig(level=5)

from langchain_astradb import AstraDBVectorStore

ad_store = AstraDBVectorStore(
    collection_name=os.environ["AUTODETECT_COLLECTION_NAME"],
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
    autodetect_collection=True,
)


for doc, sco in ad_store.similarity_search_with_score(
    "Tell me so and so",
    k=2,
):
    print(f"\n\n{'=' * 80}")
    print(f"Score = {sco}")
    print(f"Page content = {doc.page_content}")

The following outlines the preparation steps preliminary to running the above
in the various tested cases.

"native" non-vectorize store

Simply use a regular vector store with client-side embeddings and have it add_texts(...)
with a handful of entries.

Afterwards, try the autodetect script above on the same collection (but see the following Note).

Note 1: in this case, you need to create the correct embedding, e.g.:

from langchain_openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(
    api_key=os.environ["OPENAI_API_KEY"],
    model="text-embedding-ada-002",
)

and then pass it as an additional argument (embedding=embedding) when creating ad_store in the script.

(Note 2: in this case, content_field should never be needed.)

"native" vectorize-based store

Use a regular vectorize-based vector store and have it add_texts(...) with some entries.

Afterwards, try the autodetect script above on the same collection.

non-vectorize with imported CSV

You can create a Collection in the Astra UI with "Bring your own embedding", setting
the desired dimensionality, and then import a CSV with a "$vector" field into it.

An example CSV is as follows (requires a dimension of 1536):

title,reviewid,creationdate,criticname,originalscore,reviewstate,reviewtext,embedding
Beavers,1145982,2003-05-23,Ivan M. Lincoln,3.5/4,fresh,"Timed to be just long enough for most youngsters' brief attention spans -- and it's packed with plenty of interesting activity, both on land and under the water.","[-0.007622489705681801, 0.005060779396444559, 0.02063874900341034, -0.007668646518141031, -0.029461318626999855, 0.013121760450303555, -0.02066512405872345, -0.028063422068953514, -0.013530578464269638, -0.019385917112231255, -0.0017160492716357112, 0.038217950612306595, -0.0042167664505541325, -0.009692958556115627, 0.011486485600471497, -0.00171934615354985, 0.025531385093927383, -0.016128554940223694, 0.004793728236109018, -0.0242785532027483, -0.00812362227588892, 0.010451250709593296, 0.006785070989280939, -0.007695022039115429, -0.011202950030565262, 0.003709040116518736, 0.006274047773331404, -0.008407157845795155, 0.005575099494308233, -0.004233251325786114, 0.02990970015525818, -0.011156792752444744, 0.005911386106163263, -0.013781145215034485, 0.001370696467347443, 0.002546874340623617, 0.002978771459311247, 0.0009676473564468324, 0.009785272181034088, 0.004229954443871975, -0.001103645539842546, 0.006640006322413683, -0.0030611944384872913, -0.013530578464269638, 0.001376466010697186, -0.021430009976029396, 0.007879650220274925, -0.00583225954324007, -0.01449328102171421, 0.004183797165751457, 0.005904791876673698, -0.01583842560648918, 0.005782805848866701, -0.032098859548568726, -0.006171843037009239, -0.012198621407151222, -0.001701213070191443, -0.016563748940825462, 0.015996677801012993, -0.015416420064866543, 0.00233751954510808, -0.0015882934676483274, -0.0015141126932576299, 0.008815976791083813, -0.003646398661658168, -0.003705743234604597, 0.0020754141733050346, 0.0054597072303295135, 0.01279866136610508, -0.005674007348716259, 0.015363669022917747, 0.002830409910529852, 0.03217798471450806, 0.004286001902073622, 0.028142549097537994, -0.010022650472819805, -0.007760960608720779, 0.00789283774793148, 0.014849348925054073, 0.02203664369881153, 0.011288669891655445, -0.032046105712652206, -0.02753591537475586, 0.026718277484178543, 0.011202950030565262, 0.013939397409558296, 0.0020325540099292994, 0.014559219591319561, -0.011268888600170612, -0.001387180993333459, -0.0011539235711097717, 0.01931997761130333, 0.017262697219848633, 0.017856143414974213, 0.002866676077246666, -0.0018627623794600368, -0.01368883065879345, 0.02877555787563324, -0.025795137509703636, -0.035659536719322205, -0.0026523759588599205, 0.012198621407151222, 0.023421352729201317, -0.005031106993556023, -0.023368602618575096, -0.016695626080036163, 0.015521921217441559, -0.0031930715776979923, 0.033233001828193665, 0.010451250709593296, -0.030331706628203392, 0.02645452320575714, 0.006649896968156099, -0.03500015288591385, -0.0218256413936615, 0.008611567318439484, 0.0068773846141994, 0.007325766608119011, 0.016286807134747505, -0.007695022039115429, 0.020902501419186592, 0.0068773846141994, 0.0017490185564383864, -0.01345804613083601, 0.00196331855840981, 0.011189762502908707, -0.01616811752319336, -0.011618362739682198, 0.00902698002755642, -0.016550561413168907, 0.012614034116268158, -0.004454145208001137, 0.0029639352578669786, -0.002774361986666918, -0.04465354606509209, 0.0008196978596970439, -0.010358937084674835, -0.011479891836643219, -0.02691609226167202, -0.006745507940649986, 0.02050687186419964, 0.029144814237952232, -0.02268284186720848, -0.015521921217441559, 0.002256745006889105, -0.00013723448500968516, -0.00580588448792696, 0.01190849207341671, -0.011394171975553036, 0.012699753977358341, 0.019254039973020554, -0.01831771247088909, 0.005284970160573721, 0.028168924152851105, -0.011473298072814941, 0.024067549034953117, -0.0012000806163996458, 0.010167715139687061, -0.02640177309513092, -0.012666784226894379, 0.0125283133238554, 0.006396033801138401, 0.032072484493255615, -0.016629688441753387, 0.033180247992277145, 0.01738138683140278, 0.019768361002206802, -0.0033282453659921885, 0.03368138149380684, 0.00030084437457844615, -0.010899633169174194, 0.018436402082443237, -0.03273186832666397, 0.011743645183742046, -0.013029445894062519, 0.013016258366405964, 0.014915287494659424, 0.005627850536257029, 0....
Read more

libs/astradb/v0.3.5

07 Sep 08:44
cc40f28
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: libs/astradb/v0.3.3...libs/astradb/v0.3.5

libs/astradb/v0.3.3

30 May 11:18
076b0ab
Compare
Choose a tag to compare

What's Changed

  • Upgrade vectorize support to Astra's Public Preview available service by @hemidactylus in #33
  • Prepare to release full vectorize support (Bump to 0.3.3) by @hemidactylus in #35

Full Changelog: libs/astradb/v0.3.2...libs/astradb/v0.3.3

libs/astradb/v0.3.2

20 May 19:33
f41f0dc
Compare
Choose a tag to compare

What's Changed

Full Changelog: libs/astradb/v0.3.1...libs/astradb/v0.3.2

libs/astradb/v0.3.1

16 May 20:05
d116257
Compare
Choose a tag to compare

What's Changed

Full Changelog: libs/astradb/v0.3.0...libs/astradb/v0.3.1

libs/astradb/v0.3.0

09 May 13:05
d988159
Compare
Choose a tag to compare

What's Changed

Full Changelog: libs/astradb/v0.2.0...libs/astradb/v0.3.0

libs/astradb/v0.2.0

24 Apr 09:12
27dafbf
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: libs/astradb/v0.1.0...libs/astradb/v0.2.0