-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add samples to migrate pinecone to alloy db #292
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to outline the changes to the tutorial, currently how would users run both the get and add?
Changes: 1. Made snippets as standalone files 2. Compressed snippet functions into a single file.
@@ -44,6 +44,9 @@ jobs: | |||
- name: Install Sample requirements | |||
run: pip install -r samples/requirements.txt | |||
|
|||
- name: Install Migration snippets requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I have been adding all sample reqs to https://github.com/googleapis/langchain-google-alloydb-pg-python/blob/main/samples/requirements.txt so this file doesn't need to be updated. I am also ok with this pattern of adding the new req file to the workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to follow this snippet in the current version of of snippets
protobuf==5.29.1 | ||
grpcio-tools==1.67.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Milvus (ref), seems these are required.
""" | ||
|
||
# TODO(dev): Replace the values below | ||
pinecone_api_key = os.environ["PINECONE_API_KEY"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed that these would be variables to be set like https://github.com/GoogleCloudPlatform/python-docs-samples/blob/140b9dae356a8ffb4aa587571c4ee1eb1ae99e39/automl/snippets/get_model.py#L21, not environment variables.
We also discussed outline the instructions that we would give to the user/TW. Did you document in our notes that we would require users to set all the environment variables?
I would prefer that this is updated to use variables so there is not additional time and friction to understand and validate the environment variable values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limited the use of env vars to only the tests
alloydb_engine = await aget_alloydb_client() | ||
|
||
# [START pinecone_alloydb_migration_get_alloydb_vectorstore] | ||
from alloydb_snippets import aget_vector_store, get_embeddings_service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want the region tag to include the new methods. Please update this so it's clean only using the langchain methods.
embeddings_service = get_embeddings_service(pinecone_vector_size) | ||
vs = await aget_vector_store( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
region tags should include the new wrapper methods
project_id = os.environ["PROJECT_ID"] | ||
region = os.environ["REGION"] | ||
cluster = os.environ["CLUSTER_ID"] | ||
instance = os.environ["INSTANCE_ID"] | ||
db_name = os.environ["DATABASE_ID"] | ||
|
||
# TODO(dev): (optional values) Replace the values below | ||
db_user = os.environ.get("DB_USER", "") | ||
db_pwd = os.environ.get("DB_PASSWORD", "") | ||
table_name = os.environ.get("TABLE_NAME", "alloy_db_migration_table") | ||
vector_size = int(os.environ.get("VECTOR_SIZE", "768")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See note on variables not env vars
Adding code snippets to migrate from Pinecone to Alloy DB.
Upcoming PRs:
Reviewer points:
How to add an index into Pinecone:
https://paste.googleplex.com/5444295470088192
Test Log:
go/pinecone-alloydb-migration