-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Enable easy migration from BM25 to Neural Index with Reindex Step #617
Comments
@sean-zheng-amazon could you please review this issue? |
BM25 index PUT bm25_index
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1,
"similarity": {
"default": {
"type": "BM25"
}
}
}
},
"mappings": {
"properties": {
"your_field": {
"type": "text"
}
}
}
} Neural index with same mapping PUT localhost:9200/neural_index
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1,
"knn": true
}
},
"mappings": {
"properties": {
"your_field": {
"type": "text"
}
}
}
} Template PUT _plugins/_flow_framework/workflow
{
"name": "Reindex test",
"description": "Reindex test",
"use_case": "PROVISION",
"version": {
"template": "1.0.0",
"compatibility": [
"2.12.0",
"3.0.0"
]
},
"workflows": {
"provision": {
"nodes": [
{
"id": "reindex",
"type": "reindex",
"user_inputs": {
"source_index": "bm25_index",
"destination_index": "neural_index"
}
}
]
}
}
} Response:
|
@navneet1v and @vamshin can you take a look at the issue and the draft PR? Thanks |
@owaiskazi19 please update the latest status and share the draft PR with additional parameters for reindexing. thanks |
@minalsha Added {
"name": "Reindex test",
"description": "Reindex test",
"use_case": "PROVISION",
"version": {
"template": "1.0.0",
"compatibility": [
"2.12.0",
"3.0.0"
]
},
"workflows": {
"provision": {
"nodes": [
{
"id": "reindex",
"type": "reindex",
"user_inputs": {
"source_index": "bm25_index",
"destination_index": "neural_index",
"refresh": true,
"requests_per_second": 2,
"require_alias": "false",
"slices": 1,
"max_docs": 2
}
}
]
}
}
} |
Is your feature request related to a problem?
One of the most common use cases for ML offerings is enabling Neural Search by adding embeddings to an existing index. The steps to set this up are simple, documented here, and here and @owaiskazi19 demonstrated them early on in our exploratory development with a scrappy demo
What solution would you like?
The creation of an ingest pipeline and new index configuration have already been completed and will be part of the 2.13 release:
To complete this solution we need to add a
ReindexStep
that calls the Reindex API.Reindexing does have some cautions that a user should be aware of, and hiding these cautions behind an automated workflow risks surprising users with some behavior.
It will only reindex documents which were in the original index at the start of the operation, so if an index is still being written to, the reindex won't capture new documents. Also, it's an expensive operation on large indices, with this note in the linked docs:
Using a model for embeddings adds even more "expense" to this process.
Accordingly, there should be at least some sort of "confirmation prompt" when this workflow step is used. From a backend/template perspective, a path parameter expressing the user's acknowledgement of these cautions (e.g.,
allow_expensive=true
or similar) should be required, with the provisioning step failed-fast with a helpful/verbose error message if there is a reindex step present and the parameter is not set true.What alternatives have you considered?
The status quo as-of 2.13, which enables setting up the ingest pipeline and new index, but requires the user to manually perform the reindexing operation. (These steps could be combined from a front-end perspective but remain separate on the back-end.)
Do you have any additional context?
It's also possible to update the same index in-place with a pipeline using Update by query. This should be considered for a future addition.
The text was updated successfully, but these errors were encountered: