A type in Elasticsearch represents a class of similar documents. A type
consists of a name—such as user
or blogpost
—and a mapping. The
mapping, like a database schema, describes the fields or properties that
documents of that type may have, the datatype of each field—such as string
,
integer
, or date
—and how those fields should be indexed and stored by
Lucene.
Types can be useful abstractions for partitioning similar-but-not-identical data. But due to how Lucene operates they come with some restrictions.
A document in Lucene consists of a simple list of field-value pairs. A field must have at least one value, but any field can contain multiple values. Similarly, a single string value may be converted into multiple values by the analysis process. Lucene doesn’t care if the values are strings or numbers or dates—all values are just treated as opaque bytes.
When we index a document in Lucene, the values for each field are added to the inverted index for the associated field. Optionally, the original values may also be stored unchanged so that they can be retrieved later.
Elasticsearch types are implemented on top of this simple foundation. An index may have several types, and documents of any of these types may be stored in the same index.
Because Lucene has no concept of document types, the type name of each
document is stored with the document in a metadata field called _type
. When
we search for documents of a particular type, Elasticsearch simply uses a
filter on the _type
field to restrict results to documents of that type.
Lucene also has no concept of mappings. Mappings are the layer that Elasticsearch uses to map complex JSON documents into the simple flat documents that Lucene expects to receive.
For instance, the mapping for the name
field in the user
type may declare
that the field is a string
field, and that its value should be analyzed
by the whitespace
analyzer before being indexed into the inverted
index called name
:
"name": {
"type": "string",
"analyzer": "whitespace"
}
This leads to an interesting thought experiment: what happens if you have two different types, each with an identically named field but mapped differently (e.g. one is a string, the other is a number)?
Well, the short answer is that bad things happen and Elasticsearch won’t allow you to define this mapping at all. You’d receive an exception when attempting to configure the mapping.
The longer answer is that each Lucene index contains a single, flat schema
for all fields. A particular field is either mapped as a string, or a number, but
not both. And because types are a mechanism added by Elasticsearch on top
of Lucene (in the form of a metadata _type
field), all types in Elasticsearch
ultimately share the same mapping.
Take for example this mapping of two types in the data
index:
{
"data": {
"mappings": {
"people": {
"properties": {
"name": {
"type": "string",
},
"address": {
"type": "string"
}
}
},
"transactions": {
"properties": {
"timestamp": {
"type": "date",
"format": "strict_date_optional_time"
},
"message": {
"type": "string"
}
}
}
}
}
}
Each type defines two fields ("name"
/"address"
and "timestamp"
/"message"
respectively). It may look like they are independent, but under the covers Lucene
will create a single mapping which would look something like this:
{
"data": {
"mappings": {
"_type": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string"
}
"address": {
"type": "string"
}
"timestamp": {
"type": "long"
}
"message": {
"type": "string"
}
}
}
}
Note: This is not actually valid mapping syntax, just used for demonstration
The mappings are essentially flattened into a single, global schema for the entire index. And that’s why two types cannot define conflicting fields: Lucene wouldn’t know what to do when the mappings are flattened together.
So what’s the takeaway from this discussion? Technically, multiple types may live in the same index as long as their fields do not conflict (either because the fields are mutually exclusive, or because they share identical fields).
Practically though, the important lesson is this: types are useful when you need to discriminate between different segments of a single collection. The overall "shape" of the data is identical (or nearly so) between the different segments.
Types are not as well suited for entirely different types of data. If your two types have mutually exclusive sets of fields, that means half your index is going to contain "empty" values (the fields will be sparse), which will eventually cause performance problems. In these cases, it’s much better to utilize two independent indices.
In summary:
-
Good:
kitchen
andlawn-care
types inside theproducts
index, because the two types are essentially the same schema -
Bad:
products
andlogs
types inside thedata
index, because the two types are mutually exclusive. Separate these into their own indices.