Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding create ingest pipeline step #558

Merged
merged 3 commits into from
Mar 12, 2024

Conversation

amitgalitz
Copy link
Member

Description

Created ingest pipeline step so we can create a pipeline through a template. This supports any number of processors, and we support all processors here due to the limited validation. The pipeline steps also supports taking in a model ID currently from a previous step through substitution.

The two required fields for a pipeline are the pipelineId which is also referred to as pipeline name, and configurations which refer to the configurations of the pipeline, this includes the processors list, the description, tag or other optional lists. In order to support any ingest processor out there without ammending this step each time a new one is added, we convert the configurations given to a string and pass that string without extra validation on our end to the create pipeline API.

Examples:

                {
                    "id": "create_ingest_pipeline",
                    "type": "create_ingest_pipeline",
                    "previous_node_inputs": {
                        "deploy_openai_model": "model_id"
                    },
                    "user_inputs": {
                        "pipeline_id": "text-embedding-3",
                        "configurations": {
                            "description": "A text embedding pipeline",
                            "processors": [
                                {
                                    "text_embedding": {
                                        "model_id": "${{deploy_openai_model.model_id}}",
                                        "field_map": {
                                            "passage_text": "passage_embedding"
                                        }
                                    }
                                }
                            ]
                        }
                    }
                {
                  "id": "create_ingest_pipeline_2",
                  "type": "create_ingest_pipeline",
                  "previous_node_inputs": {
                      "deploy_openai_model": "model_id"
                  },
                  "user_inputs": {
                      "pipeline_id": "append-1",
                      "configurations": {
                          "description": "Pipeline that appends event type",
                          "processors": [
                              {
                                  "append": {
                                      "field": "event_types",
                                      "value": [
                                          "page_view"
                                      ]
                                  }
                              },
                              {
                                  "drop": {
                                      "if": "ctx.user_info.contains('password') || ctx.user_info.contains('credit card')"
                                  }
                              }
                          ]
                      }
                  }
              }

Issues Resolved

resolves #24

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@joshpalis joshpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, just a few comments

Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much of this is working around the bug fixed in #559, but can definitely improve the parsing...

@amitgalitz amitgalitz force-pushed the ingest-step branch 2 times, most recently from f751fb2 to 69aa46a Compare March 12, 2024 17:14
Copy link

codecov bot commented Mar 12, 2024

Codecov Report

Attention: Patch coverage is 67.34694% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 72.24%. Comparing base (1ff9649) to head (d07b542).

Files Patch % Lines
...g/opensearch/flowframework/model/WorkflowNode.java 18.18% 8 Missing and 1 partial ⚠️
.../org/opensearch/flowframework/util/ParseUtils.java 64.28% 3 Missing and 2 partials ⚠️
...ch/flowframework/workflow/CreateConnectorStep.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #558      +/-   ##
============================================
- Coverage     72.54%   72.24%   -0.31%     
+ Complexity      665      653      -12     
============================================
  Files            78       78              
  Lines          3416     3397      -19     
  Branches        271      269       -2     
============================================
- Hits           2478     2454      -24     
- Misses          822      825       +3     
- Partials        116      118       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A few suggestions.

@amitgalitz amitgalitz force-pushed the ingest-step branch 2 times, most recently from 30bb904 to baff494 Compare March 12, 2024 18:20
@amitgalitz amitgalitz merged commit b6d2092 into opensearch-project:main Mar 12, 2024
31 of 32 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/flow-framework/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/flow-framework/backport-2.x
# Create a new branch
git switch --create backport/backport-558-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 b6d2092c35c5f94fd2ea644d375bfed345e0ede8
# Push it to GitHub
git push --set-upstream origin backport/backport-558-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/flow-framework/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-558-to-2.x.

@owaiskazi19
Copy link
Member

@amitgalitz can you raise a manual backport PR?

amitgalitz added a commit to amitgalitz/opensearch-ai-flow-framework that referenced this pull request Mar 12, 2024
* adding create ingest pipeline step

Signed-off-by: Amit Galitzky <[email protected]>

* adding IT and move configurations parsing to input parsing

Signed-off-by: Amit Galitzky <[email protected]>

* cleaning up comments

Signed-off-by: Amit Galitzky <[email protected]>

---------

Signed-off-by: Amit Galitzky <[email protected]>
@amitgalitz amitgalitz mentioned this pull request Mar 12, 2024
amitgalitz added a commit to amitgalitz/opensearch-ai-flow-framework that referenced this pull request Mar 12, 2024
* adding create ingest pipeline step

Signed-off-by: Amit Galitzky <[email protected]>

* adding IT and move configurations parsing to input parsing

Signed-off-by: Amit Galitzky <[email protected]>

* cleaning up comments

Signed-off-by: Amit Galitzky <[email protected]>

---------

Signed-off-by: Amit Galitzky <[email protected]>
amitgalitz added a commit to amitgalitz/opensearch-ai-flow-framework that referenced this pull request Mar 12, 2024
* adding create ingest pipeline step

Signed-off-by: Amit Galitzky <[email protected]>

* adding IT and move configurations parsing to input parsing

Signed-off-by: Amit Galitzky <[email protected]>

* cleaning up comments

Signed-off-by: Amit Galitzky <[email protected]>

---------

Signed-off-by: Amit Galitzky <[email protected]>
amitgalitz added a commit to amitgalitz/opensearch-ai-flow-framework that referenced this pull request Mar 13, 2024
Adding create ingest pipeline step (opensearch-project#558)

* adding create ingest pipeline step



* adding IT and move configurations parsing to input parsing



* cleaning up comments



---------

Signed-off-by: Amit Galitzky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport PRs to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Create an ingest pipeline workflow step
5 participants