-
Notifications
You must be signed in to change notification settings - Fork 30
Upgrading from v2.2 to v3.0
Recogito v3.0 introduces breaking changes to the system architecture and data model. This guide documents the changes, and the steps needed to upgrade the index (ElasticSearch 5.6.5) from Recogito v2.2 to v3.0.
The following index types remain identical to Recogito v2.2:
- annotation_history
- contribution
- visit
The new schema introduces three major breaking changes:
- in the annotation type, bodies used to have a uri field, storing the URI of the place as a string. In v3.0, the uri field is replaced with a reference field. This field stores a nested object, with a uri field and, optionally, a union_id field, containing the union UUID of the entity, if it is indexed in Recogito.
- the geotag type has been dropped.
- the place type is superseded by a generic entity type. Compared to v2.2, entity introduces
the following changes:
- for clarity, id is replaced by union_id
- an additional entity_type field (PLACE, PERSON, etc.)
- a title field at the top level
- a stored bbox geo_shape field
- for conflated records, source_gazetteer is replaced by source_authority (expects a URI identifier)
- for consistency, last_sync_at is replaced by __last_synced_at
- an added country_code field for records
- place_types is replaced by subjects
- an added priority field (type long) to hold numeric weight/importance/etc. score, e.g. a place population count
-
close_matches and exact_matches is replaced by a generic links field, which contains a list
of objects of the form
{ "uri": "http://www.example.com/entity/1", "link_type": "closeMatch" }
Because of the removal of geo_tag, migrating annotations between index versions requires more than just a simple conversion. For every annotation body, we need to query the index with the bodies.uri field, in order to obtain the referenced entity's union_id. Recogito 3.0 includes a utility to perform this migration. The other index types need to migrated manually:
-
annotation_history, contribution and visit must be reindexed using ElasticSearch standard reindex API
-
the entity index must be rebuilt from scratch, by importing gazetters via the Recogito admin UI
-
contents of the annotation index must to be migrated last, using the migration utility. (TODO...)