-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Integration of Flow Framework behind Search Pipeline Processors #367
Comments
Woohoo! That will be awesome and is totally in line with the original search pipelines proposal, right down at the bottom:
One of my projects in a past life involved (expensive, but high-quality) precomputation of results for "head" queries and pushing them to a cache. Essentially, for the "stuff that everyone looks for", we were better off precomputing results. Of course, for a system like that, you need both your search index and your cache updater to have access to the stream of updates, which is easier to implement with pull-based indexing. (At that point, the cache updater did a kind of "percolate" thing, checking if each update should be matched by any of the precomputed queries or would cause a given doc to stop being matched by a precomputed query.) Implementing something like that with search pipelines, where a middle processor routes queries to the precomputed cache or the real index, would be great. |
@dbwiddis, to be clear, we have contributors/partners who have built around search pipelines (eg. conversational search). We also have internal teams who have adopted it. Hybrid search features, for instance, use it. So, yes, we should consider our users, partners and teams if we're going to build something with overlapping functionality. However, my main point was about ensuring compatibility with our existing query interfaces. Every customer has adopted one or more of our query APIs. It's important that the flow framework is compatible with existing and future query interfaces. Search pipelines is an example of a design that is compatible with existing and future query interfaces. |
Good callout, @dylan-tong-aws! I will definitely keep that in mind as we integrate with other query APIs. For this particular proposal, I'd like to keep the scope to the existing pipeline usage as it's going to be the most broad and open-ended application. It's not the answer to every problem but is probably one of the more complex parts of the overall vision. |
Coming from opensearch-project/OpenSearch#11782
|
@dylan-tong-aws this sounds like a use case for the new generic memory layer we are building in ml-commons. |
@dbwiddis But doesn't the response processor already do this? Are you referring to something that gets executed before the query phase? Something that does not require SearchHits? Search processors are really categorized based on where in the search path they are used. |
Well it can I suppose, in which case we wouldn't need a new type, we could implement behind it. Its signature is
Or instead of it. Big (blank white board) picture we would just have a workflow that expects a So you could have a search pipeline do prep work on the search, then perhaps branch off into a couple different paths and then merge together at the end.
This is somewhat why I'm talking about a new processor type. Here's the image from the docs: The new type would sit around that index block in the middle ... after the search requests but before the search responses, and would integrate the DAG-based workflow logic. That doesn't mean everything has to fit in that spot, we could implement behind any of the processor types. |
Sit around the index block or execute as an alternative to the index block? I suppose they're kind of the same thing, but I've always thought of it as the latter. Maybe you want to search the index, but maybe you want to retrieve a precomputed result, or maybe for some requests you'll call out to a different search engine 😁. |
It could be around it like a donut ;)
Or maybe all of the above and combine the results! |
I get it now. Yeah, it would have to be a new type since the existing types are tightly coupled with where in the search phase they are executed. |
Closing this issue as feedback has been collected. This idea is integrated into #475 @jackiehanyang has developed a proof of concept for this but it will probably be a "catch all fallback" to using processors themselves (without flow framework) for simpler logic. |
What/Why
What are you proposing?
Initial work on Flow Framework has focused on provisioning. This proposal is intended to highlight at a very high level what needs to happen as we transition toward handling search and ingest capabilities.
Please consider this as a pre-RFC request for community input well before any designs are proposed in a more fully fleshed-out RFC.
What users have asked for this feature?
Early discussions with users have indicated a strong desire not to change existing search queries. They are often part of existing production use or CI pipelines that are heavily tested, and even minor changes to existing queries would create churn in documentation, training, and updating tests.
What problems are you trying to solve?
Broadly speaking the Flow Framework RFC envisioned both a search orchestration capability and integration with search pipelines.
Those plans outlined in the RFC remain unchanged: we still plan to process content contained in use case templates to send configurations to existing OpenSearch and Plugin APIs. What is changing is how we expect users to search using that functionality.
Initial thoughts and commentary tended more toward orchestration as a result of @navneet1v 's comment here which would have led to an API somewhat like
However, conversations with @dylan-tong-aws have indicated customers have already adopted the functionality of search pipelines (hat tip to @msfroh) and they continue to evolve with some additional capability being added, and are very reluctant to change from this model:
Accordingly, we propose to still perform much of the same integration planned in the RFC, but minimizing/eliminating API changes from current widespread usage.
What is the developer experience going to be?
Currently, Search Pipelines supports three types of processors:
SearchRequest
into a differentSearchRequest
SearchResponse
into a differentSearchResponse
We plan to add additional
Processor
s of any of the above types when they are appropriate to that portion of the search workflow and would correspond to individual workflow steps. For example, a data transformation step prior to search could be in a pre-processor and would configure a search pipeline appropriately.In addition, we propose to add a fourth
Processor
type which takes aSearchRequest
and returns aSearchResponse
.This processor type would serve as a front-end to a workflow process executing more complex search workflows using Flow Framework.
Note that use of this new processor type would introduce expectations of latency in search workflows; however, when used they would trade a slower response with more flexible search options.
Are there any security considerations?
Some search processors access external APIs which may require access keys and/or other credentials. These need to be handled with as fine-grained control as is possible.
Are there any breaking changes to the API
The entire purpose of this proposal is to prevent breaking changes to existing search processor API calls.
What is the user experience going to be?
Users who are not taking advantage of these workflow-based processors will see no change.
Users who take advantage of these workflows will be able to configure their pipelines and have them automatically applied to specific types of search requests to give them more power in shaping their results.
Are there breaking changes to the User Experience?
The entire purpose of this proposal is to prevent changes to existing user experience.
Why should it be built? Any reason not to?
To bring the power of flow framework to existing search behavior with minimum changes.
What will it take to execute?
Processor
interface.Processor
type.Any remaining open questions?
The text was updated successfully, but these errors were encountered: