Skip to content

Commit

Permalink
feat: Elasticsearch vector database (feast-dev#4188)
Browse files Browse the repository at this point in the history
  • Loading branch information
HaoXuAI authored May 13, 2024
1 parent 37f36b6 commit bf99640
Show file tree
Hide file tree
Showing 15 changed files with 478 additions and 9 deletions.
19 changes: 19 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,25 @@ test-python-universal-cassandra-no-cloud-providers:
not test_snowflake" \
sdk/python/tests

test-python-universal-elasticsearch-online:
PYTHONPATH='.' \
FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.online_stores.contrib.elasticsearch_repo_configuration \
PYTEST_PLUGINS=sdk.python.tests.integration.feature_repos.universal.online_store.elasticsearch \
python -m pytest -n 8 --integration \
-k "not test_universal_cli and \
not test_go_feature_server and \
not test_feature_logging and \
not test_reorder_columns and \
not test_logged_features_validation and \
not test_lambda_materialization_consistency and \
not test_offline_write and \
not test_push_features_to_offline_store and \
not gcs_registry and \
not s3_registry and \
not test_universal_types and \
not test_snowflake" \
sdk/python/tests

test-python-universal:
python -m pytest -n 8 --integration sdk/python/tests

Expand Down
2 changes: 1 addition & 1 deletion docs/reference/alpha-vector-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Below are supported vector databases and implemented features:
| Vector Database | Retrieval | Indexing |
|-----------------|-----------|----------|
| Pgvector | [x] | [ ] |
| Elasticsearch | [ ] | [ ] |
| Elasticsearch | [x] | [x] |
| Milvus | [ ] | [ ] |
| Faiss | [ ] | [ ] |

Expand Down
125 changes: 125 additions & 0 deletions docs/reference/online-stores/elasticsearch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# ElasticSearch online store (contrib)

## Description

The ElasticSearch online store provides support for materializing tabular feature values, as well as embedding feature vectors, into an ElasticSearch index for serving online features. \
The embedding feature vectors are stored as dense vectors, and can be used for similarity search. More information on dense vectors can be found [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html).

## Getting started
In order to use this online store, you'll need to run `pip install 'feast[elasticsearch]'`. You can get started by then running `feast init -t elasticsearch`.

## Example

{% code title="feature_store.yaml" %}
```yaml
project: my_feature_repo
registry: data/registry.db
provider: local
online_store:
type: elasticsearch
host: ES_HOST
port: ES_PORT
user: ES_USERNAME
password: ES_PASSWORD
vector_len: 512
write_batch_size: 1000
```
{% endcode %}
The full set of configuration options is available in [ElasticsearchOnlineStoreConfig](https://rtd.feast.dev/en/master/#feast.infra.online_stores.contrib.elasticsearch.ElasticsearchOnlineStoreConfig).
## Functionality Matrix
| | Postgres |
| :-------------------------------------------------------- | :------- |
| write feature values to the online store | yes |
| read feature values from the online store | yes |
| update infrastructure (e.g. tables) in the online store | yes |
| teardown infrastructure (e.g. tables) in the online store | yes |
| generate a plan of infrastructure changes | no |
| support for on-demand transforms | yes |
| readable by Python SDK | yes |
| readable by Java | no |
| readable by Go | no |
| support for entityless feature views | yes |
| support for concurrent writing to the same key | no |
| support for ttl (time to live) at retrieval | no |
| support for deleting expired data | no |
| collocated by feature view | yes |
| collocated by feature service | no |
| collocated by entity key | no |
To compare this set of functionality against other online stores, please see the full [functionality matrix](overview.md#functionality-matrix).
## Retrieving online document vectors
The ElasticSearch online store supports retrieving document vectors for a given list of entity keys. The document vectors are returned as a dictionary where the key is the entity key and the value is the document vector. The document vector is a dense vector of floats.
{% code title="python" %}
```python
from feast import FeatureStore

feature_store = FeatureStore(repo_path="feature_store.yaml")

query_vector = [1.0, 2.0, 3.0, 4.0, 5.0]
top_k = 5

# Retrieve the top k closest features to the query vector

feature_values = feature_store.retrieve_online_documents(
feature="my_feature",
query=query_vector,
top_k=top_k
)
```
{% endcode %}

## Indexing
Currently, the indexing mapping in the ElasticSearch online store is configured as:

{% code title="indexing_mapping" %}
```json
"properties": {
"entity_key": {"type": "binary"},
"feature_name": {"type": "keyword"},
"feature_value": {"type": "binary"},
"timestamp": {"type": "date"},
"created_ts": {"type": "date"},
"vector_value": {
"type": "dense_vector",
"dims": config.online_store.vector_len,
"index": "true",
"similarity": config.online_store.similarity,
},
}
```
{% endcode %}
And the online_read API mapping is configured as:

{% code title="online_read_mapping" %}
```json
"query": {
"bool": {
"must": [
{"terms": {"entity_key": entity_keys}},
{"terms": {"feature_name": requested_features}},
]
}
},
```
{% endcode %}

And the similarity search API mapping is configured as:

{% code title="similarity_search_mapping" %}
```json
{
"field": "vector_value",
"query_vector": embedding_vector,
"k": top_k,
}
```
{% endcode %}

These APIs are subject to change in future versions of Feast to improve performance and usability.
6 changes: 3 additions & 3 deletions sdk/python/feast/feature_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -1886,7 +1886,7 @@ def retrieve_online_documents(
feature: str,
query: Union[str, List[float]],
top_k: int,
distance_metric: str,
distance_metric: Optional[str] = None,
) -> OnlineResponse:
"""
Retrieves the top k closest document features. Note, embeddings are a subset of features.
Expand All @@ -1911,7 +1911,7 @@ def _retrieve_online_documents(
feature: str,
query: Union[str, List[float]],
top_k: int,
distance_metric: str = "L2",
distance_metric: Optional[str] = None,
):
if isinstance(query, str):
raise ValueError(
Expand Down Expand Up @@ -2209,7 +2209,7 @@ def _retrieve_from_online_store(
requested_feature: str,
query: List[float],
top_k: int,
distance_metric: str,
distance_metric: Optional[str],
) -> List[Tuple[Timestamp, "FieldStatus.ValueType", Value, Value, Value]]:
"""
Search and return document features from the online document store.
Expand Down
Loading

0 comments on commit bf99640

Please sign in to comment.