Indexing

Elasticsearch, the search engine operated by Search API, stores documents in indexes.

This document describes how documents are indexed (added to Elasticsearch indexes).

Nomenclature

Link: Either the base path for a content item, or an external link.
Document: An elasticsearch document, something we can search for.
Document Type: An elasticsearch document type specifies the fields for a particular type of document. All our document types are defined in config/schema/elasticsearch_types
Index: An elasticsearch search index. Search API maintains several separate indices (detailed, government and govuk), but searches return documents from all of them.
Index Group: An alias in elasticsearch that points to one index at a time. This allows us to rebuild indexes without downtime.

How documents get added to the search indexes

There are two ways documents get added to a search index:

HTTP requests to Search API's Documents API (deprecated)
Search API subscribes to RabbitMQ messages from the Publishing API.

Search API search results are weighted by popularity. We rebuild the index nightly to incorporate the latest analytics.

Publishing API integration

Search API subscribes to a RabbitMQ queue of updates from publishing-api. This still requires Sidekiq to be running.

bundle exec rake message_queue:insert_data_into_govuk

There is also a separate process that listens to only 'links' updates from the publishing API. This is used for updating old indexes that are populated through the '/documents' API (government, detailed) and can be removed once those indexes no longer exist.

bundle exec rake message_queue:listen_to_publishing_queue

Internal only APIs

There are some other APIs that are only exposed internally:

content-api.md for the /content/* endpoint.
documents.md for the */documents/ endpoint.

These are used by search admin.

Schemas

See schemas for more detail.

Changing the schema/Reindexing

After changing the schema, you'll need to recreate the index. This reindexes documents from the existing index.

SEARCH_INDEX=all bundle exec rake search:migrate_schema

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexing.md

indexing.md

Indexing

Nomenclature

How documents get added to the search indexes

Publishing API integration

Internal only APIs

Schemas

Changing the schema/Reindexing

Files

indexing.md

Latest commit

History

indexing.md

File metadata and controls

Indexing

Nomenclature

How documents get added to the search indexes

Publishing API integration

Internal only APIs

Schemas

Changing the schema/Reindexing