Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for RAG pipeline scripts #427

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

hmumtazz
Copy link

@hmumtazz hmumtazz commented Nov 15, 2024

Description

[Describe what this change achieves]

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dhrubo-os
Copy link
Collaborator

DCO is missing

},
"settings": {
"index": {
"number_of_shards": 2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard code

Copy link

@brianf-aws brianf-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really cool stuff here! 📖 🤖 💬 Lets try and refactor some code so its reusable across classes also lets apply SRP so that a class isnt burdened by doing a lot at once.

Make sure to add documentation to files and methods.
Also Im seeing there exists a query python file within py-ml https://github.com/opensearch-project/opensearch-py-ml/blame/main/opensearch_py_ml/query.py if possible maybe we can aggregate to existing code?

Lastly lets come up with a great description of the feature. You put a lot of effort so lets make it visually appealing in the PR description (diagrams, how to use, gifs, concise summary; emojis) You can use this as influence opensearch-project/neural-search#933 talk to @dhrubo-os if maybe we want to open up a Issue and then link this PR not sure if thats too much.

Great work!

Comment on lines +27 to +51
self.aws_region = config.get('region')
self.index_name = config.get('index_name')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there input validation involved? maybe we can catch this earlier so it doesn't have to be a headache later?

opensearch_py_ml/ml_commons/rag_pipeline/rag/ingest.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/query.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/query.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/query.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/rag_setup.py Outdated Show resolved Hide resolved
print(f"Failed to initialize clients: {e}")
return False

def process_file(self, file_path: str) -> List[Dict[str, str]]:
Copy link
Collaborator

@dhrubo-os dhrubo-os Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check docstring for other methods in this repo and follow the same style.

@dhrubo-os
Copy link
Collaborator

DCO is missing.

@hmumtazz hmumtazz force-pushed the rag_pipeline branch 2 times, most recently from ced6a59 to 3260f3a Compare November 21, 2024 09:41
…ffering a suggested default value with the flexibility for users to enter a custom value if needed.

Signed-off-by: hmumtazz <[email protected]>
…, addressed comments, fixed upload csv method

Signed-off-by: hmumtazz <[email protected]>
This commit serves as a blanket sign-off for all previous commits in this pull request,
in compliance with the Developer Certificate of Origin (DCO).

Signed-off-by: hmumtazz <[email protected]>
- Add functionality for users to upload entire folders by specifying a file path, enabling batch processing of multiple files
- Show path to config file
- Implement a visual confirmation system for setup completion, using red/green indicators similar to the document ingestion process
- Combine the RAG setup and model registration into one unified process, prompting users if they want to register a model, offering options or if CX wants to use their already made custom model ID input, and providing clear confirmation when a model ID is saved, all while streamlining the overall setup for improved user experience
- Update user interface starting from setup

Signed-off-by: hmumtazz <[email protected]>
…pre-existing one, allowing users to run a single setup process

- Update requirements.txt to support RAG dependencies

Signed-off-by: hmumtazz <[email protected]>

if domain_endpoint:
# Construct the full domain URL
self.opensearch_domain_url = f'https://{domain_endpoint}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to get the total domain url I think. What if somebody wants to use http?

setup.py Outdated Show resolved Hide resolved
…s can now specify their model payload through JSON object

Signed-off-by: hmumtazz <[email protected]>
- Remove requirements.txt and setup.py from Git repository
- Add these files to .gitignore to prevent future tracking
- Keep local copies of these files for development purposes

Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
- Delete opensearch_py_ml/ml_commons/rag_pipeline/rag/.gitignore
- Consolidate gitignore rules in the root .gitignore file

Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants