What Is a Document?

Most entities or objects in most applications can be serialized into a JSON object, with keys and values. A key is the name of a field or property, and a value can be a string, a number, a Boolean, another object, an array of values, or some other specialized type such as a string representing a date or an object representing a geolocation:

{
    "name":         "John Smith",
    "age":          42,
    "confirmed":    true,
    "join_date":    "2014-06-01",
    "home": {
        "lat":      51.5,
        "lon":      0.1
    },
    "accounts": [
        {
            "type": "facebook",
            "id":   "johnsmith"
        },
        {
            "type": "twitter",
            "id":   "johnsmith"
        }
    ]
}

Often, we use the terms object and document interchangeably. However, there is a distinction. An object is just a JSON object—similar to what is known as a hash, hashmap, dictionary, or associative array. Objects may contain other objects. In Elasticsearch, the term document has a specific meaning. It refers to the top-level, or root object that is serialized into JSON and stored in Elasticsearch under a unique ID.

Warning

Field names can be any valid string, but may not include periods.

Document Metadata

A document doesn’t consist only of its data. It also has metadata—information about the document. The three required metadata elements are as follows:

_index: Where the document lives
_type: The class of object that the document represents
_id: The unique identifier for the document

_index

An index is a collection of documents that should be grouped together for a common reason. For example, you may store all your products in a products index, while all your sales transactions go in sales. Although it is possible to store unrelated data together in a single index, it is often considered an anti-pattern.

Tip

Actually, in Elasticsearch, our data is stored and indexed in shards, while an index is just a logical namespace that groups together one or more shards. However, this is an internal detail; our application shouldn’t care about shards at all. As far as our application is concerned, our documents live in an index. Elasticsearch takes care of the details.

We cover how to create and manage indices ourselves in [index-management], but for now we will let Elasticsearch create the index for us. All we have to do is choose an index name. This name must be lowercase, cannot begin with an underscore, and cannot contain commas. Let’s use website as our index name.

_type

Data may be grouped loosely together in an index, but often there are sub-partitions inside that data which may be useful to explicitly define. For example, all your products may go inside a single index. But you have different categories of products, such as "electronics", "kitchen" and "lawn-care".

The documents all share an identical (or very similar) schema: they have a title, description, product code, price. They just happen to belong to sub-categories under the umbrella of "Products".

Elasticsearch exposes a feature called types which allows you to logically partition data inside of an index. Documents in different types may have different fields, but it is best if they are highly similar. We’ll talk more about the restrictions and applications of types in [mapping].

A _type name can be lowercase or uppercase, but shouldn’t begin with an underscore or period. It also may not contain commas, and is limited to a length of 256 characters. We will use blog for our type name.

_id

The ID is a string that, when combined with the _index and _type, uniquely identifies a document in Elasticsearch. When creating a new document, you can either provide your own _id or let Elasticsearch generate one for you.

Other Metadata

There are several other metadata elements, which are presented in [mapping]. With the elements listed previously, we are already able to store a document in Elasticsearch and to retrieve it by ID—in other words, to use Elasticsearch as a document store.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05_Document.asciidoc

05_Document.asciidoc

What Is a Document?

Document Metadata

_index

_type

_id

Other Metadata

Files

05_Document.asciidoc

Latest commit

History

05_Document.asciidoc

File metadata and controls

What Is a Document?

Document Metadata

_index

_type

_id

Other Metadata