Skip to content

ERMrest 101

Aref Shafaei edited this page May 20, 2021 · 6 revisions

In this document we try to summarize some parts of ERMrest and how ERMrestJS is communicating with it. We are trying to keep the explanation simple and short. To find more about ERMrest you can refer to the ERMrest repository and documents.

If you want to know more about ERMrest you could also read this paper.

This document is written by ERMrestJS/Chaise developers and from the point of view of these applications. Some of the points that we are explaining here might not be the complete set of features in ERMrest. The terminology used to explain some parts might not be what is exactly used in ERMrest, but we tried to keep it aligned. The point of this document is to provide a simple ERMrest explanation.

Introduction

ERMrest (rhymes with "earn rest") is a general relational data storage service for web-based, data-oriented collaboration. It allows general entity-relationship modeling of data resources manipulated by RESTful access methods.

Terminology

To find the complete list of terminology in ermrest please refer to this document, in the following we will explain some of them.

  • Catalog: a particular dataset. A catalog can have multiple schemas.
  • Schema: entire data model of a dataset. A schema can have multiple tables. In ermrest, schemas can be defined with a JSON object. To find some examples please refer to schema document
  • Model Annotation: machine-readable documentation. In other words, in some cases we want to tell the client (in our case ermrestjs/chaise) to interpret the model in a different way. The annotations don't affect the behavior of ERMrest but merely inform clients about intended use beyond that captured in the entity-relationship model. For example, assume that you don't want to change the column name in database but you want users to see another name for the column, you could use appropriate annotations to achieve this. For more information please refer to annotation document in ermrest.
  • Entity: a set of data tuples corresponding to a (possibly filtered) table.
  • Attribute: a set of data tuples corresponding to a (possibly filtered) projection of a table
  • Attribute group: a set of data tuples corresponding to a (possibly filtered) projection of a table grouped by group keys.

Schema Document

You can define all of your models in your schema (table, column, foreignkey) in a JSON object. We don't want to go into detail of what are the different attributes of this object. The bare minimum of schema document looks like the following.

{
    "comment": "SCHEMA_COMMENT (optional)",
    "schema_name": "SCHEMA_NAME",
    "tables": {
        "TABLE_NAME": {
            "kind": "it can be `view` or `table`",
            "schema_name": "SCHEMA_NAME",
            "comment": "TABLE_COMMENT (optional)",
            "keys": [
                {
                    "comment": "KEY_COMMENT (optional)",
                    "unique_columns": ["PK_COL_NAME"],
                    "annotations": {}
                }
            ],
            "column_definitions": [
                {
                    "comment": "COLUMN_COMMENT",
                    "name": "PK_COL_NAME",
                    "nullok": false,
                    "type": {
                        "typename": "serial4"
                    },
                    "annotations": {}
                }
            ],
            "foreign_keys": [],
            "annotations": {}
        }
    },
    "annotations": {}
}

You can find a better and more in-depth explanation of schema document in the Table Creation document in ERMrset. You can also take a look at different deployments to find more examples (for example in dev.facebase), or the schemas that we are using for testing purposes.

ERMrest API Usage

ERMrest has a lot of in depth documentations about its huge amount of features. You can always refer to those for more information. Here we will list some of the more common usages of the model and the data API in ERMrest.

Since ERMrest model resources belong to a specific catalog, they always start with the catalog ID. Both model and data retrieval resources are following this rule. The resources are always under SERVICE/catalog/CATALOG_ID/.

Model Retrieval

You can use ERMrest REST API to take a glance at your model definitions. To access your model definitions you can follow the hierarical pattern of models:

  • SERVICE/catalog/CATALOG_ID/schema: definition of all the schemas in catalog.
  • SERVICE/catalog/CATALOG_ID/schema/SCHEMA_NAME: definition of the schema.
  • SERVICE/catalog/CATALOG_ID/schema/SCHEMA_NAME/table: definition of all the tables in the schema.
  • SERVICE/catalog/CATALOG_ID/schema/SCHEMA_NAME/table/TABLE_NAME: definition of the table.
  • To find more refer to ERMrest document.

Examples:

Data Retrieval

Data retrival resources are a bit more complicated. The resources are under SERVICE/catalog/CATALOG_ID/ERMREST_API where ERMREST_API can be any of entity, aggregate, attribute, or attributegroup. Depending on the API used, the resources will be in different locations.

Entity

SERVICE/catalog/CATALOG_ID/entity/PATH

To find more information about path, please refer to Data Path section. The denoted entity set has the same tuple structure as the final table instance in path and may be a subset of the entities based on joining and filtering criteria encoded in path. The set of resulting tuples are distinct according to the key definitions of that table instance, i.e. any joins in the path may be used to filter out rows but do not cause duplicate rows.

Examples:

ERMrest resources: Entity Retrieval

Attribute

SERVICE/catalog/CATALOG_ID/attribute/PATH/COLUMN_REFERENCES

To find more information about path, please refer to Data Path section. The path is interpreted identically to the entity resource space. However, rather than denoting a set of whole entities, the attribute resource space denotes specific fields projected from that set of entities.

Examples:

ERMrest resources: Attribute Names, Attribute Retrieval

Attribute Group

SERVICE/catalog/CATALOG_ID/attributegroup/PATH/GROUP_KEYS

SERVICE/catalog/CATALOG_ID/attributegroup/PATH/GROUP_KEYS;AGGREGATE_COLS

To find more information about path, please refer to Data Path section. The path is interpreted slightly differently than in the attribute resource space. Rather than denoting a set of entities drawn from the final table instance in path, it denotes a set of entity combinations, meaning that there is a potential for a combinatoric number of records depending on how path entity elements are linked. This denoted set of entity combinations is reduced to groups where each group represents a set of entities sharing the same group key tuple, and optional aggregate list elements are evaluated over this set of entities to produce a group-level aggregate value.

The GROUP_KEYS list elements use the same notation as the column reference elements in the attribute resource space. The AGGREGATE_COLS list elements use the same notation as the aggregate elements in the aggregate resource space or the column reference elements in the attribute resource space. An aggregate using column reference notation denotes an example value chosen from an arbitrary member of each group.

Examples:

ERMrest resources: Attribute Group Names, Attribute Group Retrieval

Aggregate

To find more information about path, please refer to Data Path section. The path is interpreted slightly differently than in the attribute resource space. Rather than denoting a set of entities drawn from the final table instance in path, it denotes a set of entity combinations, meaning that there is a potential for a combinatoric number of intermediate records depending on how path entity elements are linked. This denoted set of entity combinations is reduced to a single aggregate tuple.

Examples:

ERMrest resources: Aggregate Names, Aggregate Retrieval

Data Path

ERMrest introduces a general path-based syntax for naming data resources with idioms for navigation and filtering of entity sets. The path element of the data resource name always denotes a set of entities or joined entities. The path must be interpreted from left to right in order to understand its meaning. The denoted entity set is understood upon reaching the right-most element of the path and may be modified by the resource space or api under which the path occurs.

Path Root

A path always starts with a direct table reference:

  • TABLE_NAME
  • SCHEMA_NAME:TABLE_NAME

A path consisting of only one table reference denotes the entities within that table. Examples:

Path Filter

In order to retrieve a filtered subset of the entities denoted by parent path, you can use the filter language of ERMrest. Example:

Entity Links

As we mentioned the path should be written from left to right. The path always starts with a table, in order to get to another table through joins, you can use the entity linking syntax of ERMrest.

Table Instance Alias

The root element or any entity link element may be decorated with an alias prefix:

ALIAS:=TABLE_NAME
ALIAS:=(COL_NAME,...)

This denotes the same entity set as the plain element but also binds the alias as a way to reference a particular table instance from other path elements to the right of the alias binding. All aliases bound in a single path must be distinct. The alias can form a convenient short-hand to avoid repeating long table names, and also enables expression of more complex concepts not otherwise possible. In ERMrestJS, we are heavily using aliases.

To reset a path to one of the defined aliases, you can use $ALIAS. This has no effect on the overall joining structure nor filtering of the parent path but changes the denoted entity set to be that of the aliased table instance. It also changes the column resolution logic to attempt to resolve unqualified column names within the aliased table instance rather than right-most entity link element within parent path.

Examples:

Modifiers

Optional sorting and paging modifiers can modify the ordering and results. Please refer to ERMrest Sort Modifier and Paging Modifiers documentations for more.