Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add samples to migrate pinecone to alloy db #292

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

vishwarajanand
Copy link
Contributor

@vishwarajanand vishwarajanand commented Dec 17, 2024

Adding code snippets to migrate from Pinecone to Alloy DB.

Upcoming PRs:

  1. More DBs
  2. Cloud build file

Reviewer points:

  1. Region tags
  2. Code structure & readability
  3. Use cases coverage

How to add an index into Pinecone:

https://paste.googleplex.com/5444295470088192

Test Log:

go/pinecone-alloydb-migration

@product-auto-label product-auto-label bot added api: alloydb Issues related to the googleapis/langchain-google-alloydb-pg-python API. samples Issues that are directly related to samples. labels Dec 17, 2024
@vishwarajanand vishwarajanand changed the title chore: add samples to migrate pinecone to alloy db docs: add samples to migrate pinecone to alloy db Dec 17, 2024
Copy link
Collaborator

@averikitsch averikitsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to outline the changes to the tutorial, currently how would users run both the get and add?

samples/migrations/snippets/alloydb_snippets.py Outdated Show resolved Hide resolved
samples/migrations/snippets/pinecone_snippets.py Outdated Show resolved Hide resolved
Changes:
1. Made snippets as standalone files
2. Compressed snippet functions into a single file.
@@ -44,6 +44,9 @@ jobs:
- name: Install Sample requirements
run: pip install -r samples/requirements.txt

- name: Install Migration snippets requirements
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I have been adding all sample reqs to https://github.com/googleapis/langchain-google-alloydb-pg-python/blob/main/samples/requirements.txt so this file doesn't need to be updated. I am also ok with this pattern of adding the new req file to the workflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to follow this snippet in the current version of of snippets

Comment on lines +8 to +9
protobuf==5.29.1
grpcio-tools==1.67.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Milvus (ref), seems these are required.

samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved
samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved
"""

# TODO(dev): Replace the values below
pinecone_api_key = os.environ["PINECONE_API_KEY"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed that these would be variables to be set like https://github.com/GoogleCloudPlatform/python-docs-samples/blob/140b9dae356a8ffb4aa587571c4ee1eb1ae99e39/automl/snippets/get_model.py#L21, not environment variables.
We also discussed outline the instructions that we would give to the user/TW. Did you document in our notes that we would require users to set all the environment variables?

I would prefer that this is updated to use variables so there is not additional time and friction to understand and validate the environment variable values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limited the use of env vars to only the tests

samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved
Comment on lines 143 to 144
embeddings_service = get_embeddings_service(pinecone_vector_size)
vs = await aget_vector_store(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

region tags should include the new wrapper methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aget_vector_store is a new method which we define and use in this snippet.

samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved
samples/migrations/alloydb_snippets.py Outdated Show resolved Hide resolved
samples/migrations/alloydb_snippets.py Outdated Show resolved Hide resolved
@vishwarajanand vishwarajanand marked this pull request as ready for review January 2, 2025 10:50
@vishwarajanand vishwarajanand requested review from a team as code owners January 2, 2025 10:50
Comment on lines +167 to +172
inserted_ids = await vs.aadd_embeddings(
texts=contents,
embeddings=embeddings,
metadatas=metadatas,
ids=ids,
)
Copy link
Contributor Author

@vishwarajanand vishwarajanand Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per another comment in a different PR, seems like using __concurrent_batch_insert is preferable to directly calling aadd_embeddings.

I've not used it because __concurrent_batch_insert is tied to using batches of type AsyncIterator[Sequence[RowMapping]] which might not be available for all our usecases.

Not a strong preference, I can change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: alloydb Issues related to the googleapis/langchain-google-alloydb-pg-python API. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants