-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit for RAG pipeline scripts #427
base: main
Are you sure you want to change the base?
Conversation
DCO is missing |
43e01d3
to
db78131
Compare
opensearch_py_ml/ml_commons/rag_pipeline/rag/opensearch_class.py
Outdated
Show resolved
Hide resolved
}, | ||
"settings": { | ||
"index": { | ||
"number_of_shards": 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really cool stuff here! 📖 🤖 💬 Lets try and refactor some code so its reusable across classes also lets apply SRP so that a class isnt burdened by doing a lot at once.
Make sure to add documentation to files and methods.
Also Im seeing there exists a query python file within py-ml https://github.com/opensearch-project/opensearch-py-ml/blame/main/opensearch_py_ml/query.py if possible maybe we can aggregate to existing code?
Lastly lets come up with a great description of the feature. You put a lot of effort so lets make it visually appealing in the PR description (diagrams, how to use, gifs, concise summary; emojis) You can use this as influence opensearch-project/neural-search#933 talk to @dhrubo-os if maybe we want to open up a Issue and then link this PR not sure if thats too much.
Great work!
self.aws_region = config.get('region') | ||
self.index_name = config.get('index_name') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there input validation involved? maybe we can catch this earlier so it doesn't have to be a headache later?
print(f"Failed to initialize clients: {e}") | ||
return False | ||
|
||
def process_file(self, file_path: str) -> List[Dict[str, str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check docstring for other methods in this repo and follow the same style.
opensearch_py_ml/ml_commons/rag_pipeline/rag/opensearch_class.py
Outdated
Show resolved
Hide resolved
DCO is missing. |
ced6a59
to
3260f3a
Compare
Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
…ffering a suggested default value with the flexibility for users to enter a custom value if needed. Signed-off-by: hmumtazz <[email protected]>
…, addressed comments, fixed upload csv method Signed-off-by: hmumtazz <[email protected]>
This commit serves as a blanket sign-off for all previous commits in this pull request, in compliance with the Developer Certificate of Origin (DCO). Signed-off-by: hmumtazz <[email protected]>
a05e767
to
2635751
Compare
- Add functionality for users to upload entire folders by specifying a file path, enabling batch processing of multiple files - Show path to config file - Implement a visual confirmation system for setup completion, using red/green indicators similar to the document ingestion process - Combine the RAG setup and model registration into one unified process, prompting users if they want to register a model, offering options or if CX wants to use their already made custom model ID input, and providing clear confirmation when a model ID is saved, all while streamlining the overall setup for improved user experience - Update user interface starting from setup Signed-off-by: hmumtazz <[email protected]>
…pre-existing one, allowing users to run a single setup process - Update requirements.txt to support RAG dependencies Signed-off-by: hmumtazz <[email protected]>
|
||
if domain_endpoint: | ||
# Construct the full domain URL | ||
self.opensearch_domain_url = f'https://{domain_endpoint}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to get the total domain url I think. What if somebody wants to use http
?
opensearch_py_ml/ml_commons/rag_pipeline/rag/AIConnectorHelper.py
Outdated
Show resolved
Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/AIConnectorHelper.py
Outdated
Show resolved
Hide resolved
…s can now specify their model payload through JSON object Signed-off-by: hmumtazz <[email protected]>
- Remove requirements.txt and setup.py from Git repository - Add these files to .gitignore to prevent future tracking - Keep local copies of these files for development purposes Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
- Delete opensearch_py_ml/ml_commons/rag_pipeline/rag/.gitignore - Consolidate gitignore rules in the root .gitignore file Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
Description
[Describe what this change achieves]
Issues Resolved
[List any issues this PR will resolve]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.