New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

docs: add samples to migrate pinecone to alloy db #292

Open

vishwarajanand wants to merge 10 commits into main from code-snippets

+489 −0

Contributor

vishwarajanand commented Dec 17, 2024 •

edited

Loading

Adding code snippets to migrate from Pinecone to Alloy DB.

Upcoming PRs:

More DBs
Cloud build file

Reviewer points:

Region tags
Code structure & readability
Use cases coverage

How to add an index into Pinecone:

https://paste.googleplex.com/5444295470088192

Test Log:

go/pinecone-alloydb-migration


          chore: add samples to migrate pinecone to alloy db

ec9a0b5

product-auto-label bot added api: alloydb samples labels

vishwarajanand changed the title ~~chore: add samples to migrate pinecone to alloy db~~ docs: add samples to migrate pinecone to alloy db


          fix: add Google file header

7d6f68d

averikitsch requested changes

View reviewed changes

Collaborator

averikitsch left a comment

We may need to outline the changes to the tutorial, currently how would users run both the get and add?

samples/migrations/snippets/alloydb_snippets.py Outdated Show resolved Hide resolved

samples/migrations/snippets/pinecone_snippets.py Outdated Show resolved Hide resolved

averikitsch reviewed

View reviewed changes

samples/migrations/snippets/snippets_test.py Outdated Show resolved Hide resolved

vishwarajanand added 2 commits

December 18, 2024 18:13


          fix: address PR comments

0ee4df7


          fix: address pr comments

ec7abce

Changes:
1. Made snippets as standalone files
2. Compressed snippet functions into a single file.

averikitsch requested changes

View reviewed changes

.github/workflows/lint.yml

@@ @@ -44,6 +44,9 @@ jobs: @@
                     - name: Install Sample requirements
                       run: pip install -r samples/requirements.txt
+                    - name: Install Migration snippets requirements

Collaborator

averikitsch Dec 19, 2024

nit: I have been adding all sample reqs to https://github.com/googleapis/langchain-google-alloydb-pg-python/blob/main/samples/requirements.txt so this file doesn't need to be updated. I am also ok with this pattern of adding the new req file to the workflow

Contributor Author

vishwarajanand Dec 23, 2024

Tried to follow this snippet in the current version of of snippets

samples/migrations/requirements.txt

Comment on lines +8 to +9

		protobuf==5.29.1
		grpcio-tools==1.67.1

Collaborator

averikitsch Dec 19, 2024

Why are these needed?

Contributor Author

vishwarajanand Dec 22, 2024

For Milvus (ref), seems these are required.

samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved

samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved

samples/migrations/pinecone_migration.py Outdated

+              """
+              # TODO(dev): Replace the values below
+              pinecone_api_key = os.environ["PINECONE_API_KEY"]

Collaborator

averikitsch Dec 19, 2024

We discussed that these would be variables to be set like https://github.com/GoogleCloudPlatform/python-docs-samples/blob/140b9dae356a8ffb4aa587571c4ee1eb1ae99e39/automl/snippets/get_model.py#L21, not environment variables.
We also discussed outline the instructions that we would give to the user/TW. Did you document in our notes that we would require users to set all the environment variables?

I would prefer that this is updated to use variables so there is not additional time and friction to understand and validate the environment variable values.

Contributor Author

vishwarajanand Dec 23, 2024

Limited the use of env vars to only the tests

samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved

samples/migrations/pinecone_migration.py Outdated

Comment on lines 143 to 144

		embeddings_service = get_embeddings_service(pinecone_vector_size)
		vs = await aget_vector_store(

Collaborator

averikitsch Dec 19, 2024

region tags should include the new wrapper methods

Contributor Author

vishwarajanand Jan 2, 2025

aget_vector_store is a new method which we define and use in this snippet.

samples/migrations/pinecone_migration.py Outdated Show resolved Hide resolved

samples/migrations/alloydb_snippets.py Outdated Show resolved Hide resolved

samples/migrations/alloydb_snippets.py Outdated Show resolved Hide resolved

vishwarajanand added 6 commits

December 22, 2024 04:00


          Merge branch 'main' into code-snippets

3c38878


          chore: address some pr comments

7644c9b


          fix: lint

b17a017


          fix: lint


          fix: lint add type hints to params of main method

4daacc5


          chore: remove custom id column requirement

ab89d03

vishwarajanand marked this pull request as ready for review

January 2, 2025 10:50

vishwarajanand requested review from a team as code owners

January 2, 2025 10:50

blunderbuss-gcf bot assigned twishabansal

vishwarajanand commented

View reviewed changes

samples/migrations/migrate_pinecone_vectorstore_to_alloydb.py

Comment on lines +167 to +172

+                      inserted_ids = await vs.aadd_embeddings(
+                          texts=contents,
+                          embeddings=embeddings,
+                          metadatas=metadatas,
+                          ids=ids,
+                      )

Contributor Author

vishwarajanand Jan 2, 2025 •

edited

Loading

Per another comment in a different PR, seems like using __concurrent_batch_insert is preferable to directly calling aadd_embeddings.

I've not used it because __concurrent_batch_insert is tied to using batches of type AsyncIterator[Sequence[RowMapping]] which might not be available for all our usecases.

Not a strong preference, I can change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: alloydb samples