Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add samples to migrate pinecone to alloy db #292

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

vishwarajanand
Copy link
Contributor

@vishwarajanand vishwarajanand commented Dec 17, 2024

Adding code snippets to migrate from Pinecone to Alloy DB.

Upcoming PRs:

  1. More DBs
  2. Cloud build file

Reviewer points:

  1. Region tags
  2. Code structure & readability
  3. Use cases coverage

How to add an index into Pinecone:

https://paste.googleplex.com/5444295470088192

Test Log:

go/pinecone-alloydb-migration

@product-auto-label product-auto-label bot added api: alloydb Issues related to the googleapis/langchain-google-alloydb-pg-python API. samples Issues that are directly related to samples. labels Dec 17, 2024
@vishwarajanand vishwarajanand changed the title chore: add samples to migrate pinecone to alloy db docs: add samples to migrate pinecone to alloy db Dec 17, 2024
Copy link
Collaborator

@averikitsch averikitsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to outline the changes to the tutorial, currently how would users run both the get and add?

samples/migrations/snippets/alloydb_snippets.py Outdated Show resolved Hide resolved
samples/migrations/snippets/pinecone_snippets.py Outdated Show resolved Hide resolved
Changes:
1. Made snippets as standalone files
2. Compressed snippet functions into a single file.
@@ -44,6 +44,9 @@ jobs:
- name: Install Sample requirements
run: pip install -r samples/requirements.txt

- name: Install Migration snippets requirements
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I have been adding all sample reqs to https://github.com/googleapis/langchain-google-alloydb-pg-python/blob/main/samples/requirements.txt so this file doesn't need to be updated. I am also ok with this pattern of adding the new req file to the workflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to follow this snippet in the current version of of snippets

Comment on lines +8 to +9
protobuf==5.29.1
grpcio-tools==1.67.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Milvus (ref), seems these are required.

samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved
samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved
"""

# TODO(dev): Replace the values below
pinecone_api_key = os.environ["PINECONE_API_KEY"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed that these would be variables to be set like https://github.com/GoogleCloudPlatform/python-docs-samples/blob/140b9dae356a8ffb4aa587571c4ee1eb1ae99e39/automl/snippets/get_model.py#L21, not environment variables.
We also discussed outline the instructions that we would give to the user/TW. Did you document in our notes that we would require users to set all the environment variables?

I would prefer that this is updated to use variables so there is not additional time and friction to understand and validate the environment variable values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limited the use of env vars to only the tests

alloydb_engine = await aget_alloydb_client()

# [START pinecone_alloydb_migration_get_alloydb_vectorstore]
from alloydb_snippets import aget_vector_store, get_embeddings_service
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want the region tag to include the new methods. Please update this so it's clean only using the langchain methods.

Comment on lines 143 to 144
embeddings_service = get_embeddings_service(pinecone_vector_size)
vs = await aget_vector_store(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

region tags should include the new wrapper methods

samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved
Comment on lines 21 to 31
project_id = os.environ["PROJECT_ID"]
region = os.environ["REGION"]
cluster = os.environ["CLUSTER_ID"]
instance = os.environ["INSTANCE_ID"]
db_name = os.environ["DATABASE_ID"]

# TODO(dev): (optional values) Replace the values below
db_user = os.environ.get("DB_USER", "")
db_pwd = os.environ.get("DB_PASSWORD", "")
table_name = os.environ.get("TABLE_NAME", "alloy_db_migration_table")
vector_size = int(os.environ.get("VECTOR_SIZE", "768"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note on variables not env vars

samples/migrations/alloydb_snippets.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: alloydb Issues related to the googleapis/langchain-google-alloydb-pg-python API. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants