-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add samples to migrate pinecone to alloy db #292
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to outline the changes to the tutorial, currently how would users run both the get and add?
Changes: 1. Made snippets as standalone files 2. Compressed snippet functions into a single file.
@@ -44,6 +44,9 @@ jobs: | |||
- name: Install Sample requirements | |||
run: pip install -r samples/requirements.txt | |||
|
|||
- name: Install Migration snippets requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I have been adding all sample reqs to https://github.com/googleapis/langchain-google-alloydb-pg-python/blob/main/samples/requirements.txt so this file doesn't need to be updated. I am also ok with this pattern of adding the new req file to the workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to follow this snippet in the current version of of snippets
protobuf==5.29.1 | ||
grpcio-tools==1.67.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Milvus (ref), seems these are required.
""" | ||
|
||
# TODO(dev): Replace the values below | ||
pinecone_api_key = os.environ["PINECONE_API_KEY"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed that these would be variables to be set like https://github.com/GoogleCloudPlatform/python-docs-samples/blob/140b9dae356a8ffb4aa587571c4ee1eb1ae99e39/automl/snippets/get_model.py#L21, not environment variables.
We also discussed outline the instructions that we would give to the user/TW. Did you document in our notes that we would require users to set all the environment variables?
I would prefer that this is updated to use variables so there is not additional time and friction to understand and validate the environment variable values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limited the use of env vars to only the tests
embeddings_service = get_embeddings_service(pinecone_vector_size) | ||
vs = await aget_vector_store( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
region tags should include the new wrapper methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aget_vector_store
is a new method which we define and use in this snippet.
inserted_ids = await vs.aadd_embeddings( | ||
texts=contents, | ||
embeddings=embeddings, | ||
metadatas=metadatas, | ||
ids=ids, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per another comment in a different PR, seems like using __concurrent_batch_insert
is preferable to directly calling aadd_embeddings
.
I've not used it because __concurrent_batch_insert
is tied to using batches of type AsyncIterator[Sequence[RowMapping]]
which might not be available for all our usecases.
Not a strong preference, I can change.
Adding code snippets to migrate from Pinecone to Alloy DB.
Upcoming PRs:
Reviewer points:
How to add an index into Pinecone:
https://paste.googleplex.com/5444295470088192
Test Log:
go/pinecone-alloydb-migration