Elasticsearch, the search engine operated by Search API, stores documents in indexes.
This document describes how documents are indexed (added to Elasticsearch indexes).
- Link: Either the base path for a content item, or an external link.
- Document: An elasticsearch document, something we can search for.
- Document Type: An elasticsearch document type specifies the fields for a particular type of document. All our document types are defined in config/schema/elasticsearch_types
- Index: An elasticsearch search
index. Search API
maintains several separate indices (
detailed
,government
andgovuk
), but searches return documents from all of them. - Index Group: An alias in elasticsearch that points to one index at a time. This allows us to rebuild indexes without downtime.
There are two ways documents get added to a search index:
- HTTP requests to Search API's Documents API (deprecated)
- Search API subscribes to RabbitMQ messages from the Publishing API.
Search API search results are weighted by popularity. We rebuild the index nightly to incorporate the latest analytics.
Search API subscribes to a RabbitMQ queue of updates from publishing-api. This still requires Sidekiq to be running.
bundle exec rake message_queue:insert_data_into_govuk
There is also a separate process that listens to only 'links' updates from the publishing API. This is used for updating old indexes that are populated through the '/documents' API (government
, detailed
) and can be removed once those indexes no longer exist.
bundle exec rake message_queue:listen_to_publishing_queue
There are some other APIs that are only exposed internally:
- content-api.md for the
/content/*
endpoint. - documents.md for the
*/documents/
endpoint.
These are used by search admin.
See schemas for more detail.
After changing the schema, you'll need to recreate the index. This reindexes documents from the existing index.
SEARCH_INDEX=all bundle exec rake search:migrate_schema