-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rag opensearch usecase with Beam's MLTransform #32018
Conversation
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run yapf
manually to clean up the whitespace (see https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-Runningyapfformattermanually)
if self.batch_counter == 0: | ||
return | ||
|
||
with _InsertEmbeddingInOpenSearchSink(self.host, self.port, self.username, self.password, self.embedded_columns) as sink: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you're instantiating a new instance of the internal class here every time? You wind up creating a new class + client every time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reason I did that because I don't want to bulk push on a stale connection. Given it's a batch push I thought it might be better instead of try catching an exception and reinstantiating a connection after error.
What do you think ?
examples/notebooks/beam-ml/rag_usecase/opensearch_enrichment.py
Outdated
Show resolved
Hide resolved
331df69
to
3affaad
Compare
I'd recommend rebasing the PR now that the initial RAG PR is merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase this PR to remove the duplicate code (anything from #31657 that has already been reviewed and merged)
examples/notebooks/beam-ml/rag_usecase/opensearch_enrichment.py
Outdated
Show resolved
Hide resolved
examples/notebooks/beam-ml/rag_usecase/opensearch_rag_pipeline.ipynb
Outdated
Show resolved
Hide resolved
examples/notebooks/beam-ml/rag_usecase/opensearch_rag_pipeline.ipynb
Outdated
Show resolved
Hide resolved
examples/notebooks/beam-ml/rag_usecase/opensearch_rag_pipeline.ipynb
Outdated
Show resolved
Hide resolved
examples/notebooks/beam-ml/rag_usecase/opensearch_rag_pipeline.ipynb
Outdated
Show resolved
Hide resolved
examples/notebooks/beam-ml/rag_usecase/opensearch_rag_pipeline.ipynb
Outdated
Show resolved
Hide resolved
@jrmccluskey - thanks for the review suggestions! I have made changes as per your feedback, please let me know if these look good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, probably time to focus on cleaning up the notebook
Same note on rebasing the branch, see https://stackoverflow.com/questions/7929369/how-to-rebase-local-branch-onto-remote-master/18442755#18442755
@jrmccluskey - the branch seems upto date with master |
@itsayushpandey I think this is the problem: Your branch doesn't contain the most recent set of changes. To fix, you can:
|
Hi @damccorm - the trailing commits of branch should have been updated. Can you please check if this looks good to go now? |
examples/notebooks/beam-ml/rag_usecase/opensearch_rag_pipeline.ipynb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow review - has been a very busy stretch. This looks good to me - just had one minor comment, once that is addressed I'll merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* Adding insertion and enrichment pipeline * Enhanced Data Schema * Added Apache Licensed to the notebook * Adding Chunking Strategy * removed unused imports * Modified insertion logic in redis for incorporating chunking strategy * refacted redis code * code review changes * Added chunking code in notebook * Added code review changes * Code review changes: using chunking strategy as enum * Added Code Review Changes * Code review changes * Added code review changes * Added Code Review Changes * Code review changes * Ingestion and Enrichment pipeline for OpenSearch Vector DB * Added logic for reading password from .env file * Added opensearch vector notebook * Update credentials.env * Added code review changes * Added Description in opensearch notebook * Added description in opensearch notebook * Code review changes
Added Insertion and Enrichment pipeline for RAG (Retrival Augmented Generation) usecase using OpenSearch vector database #GSOC-259
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.