forked from cortexproject/cortex
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Created dedicated section for chunks storage in the doc (cortexprojec…
…t#3407) * Created dedicated section to chunks storage in the doc Signed-off-by: Marco Pracucci <[email protected]> * Fixed white noise Signed-off-by: Marco Pracucci <[email protected]> * Addressed review comments Signed-off-by: Marco Pracucci <[email protected]> * Fixed links Signed-off-by: Marco Pracucci <[email protected]> * Fixed TestPurger_Restarts flakyness Signed-off-by: Marco Pracucci <[email protected]>
- Loading branch information
Showing
42 changed files
with
329 additions
and
220 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
--- | ||
title: "Chunks Storage" | ||
linkTitle: "Chunks Storage" | ||
weight: 4 | ||
menu: | ||
--- | ||
|
||
The chunks storage is a Cortex storage engine which stores each single time series into a separate object called _chunk_. Each chunk contains the samples for a given period (defaults to 12 hours). Chunks are then indexed by time range and labels, in order to provide a fast lookup across many (over millions) chunks. For this reason, the Cortex chunks storage requires two backend storages: a key-value store for the index and an object store for the chunks. | ||
|
||
The supported backends for the **index store** are: | ||
|
||
* [Amazon DynamoDB](https://aws.amazon.com/dynamodb) | ||
* [Google Bigtable](https://cloud.google.com/bigtable) | ||
* [Apache Cassandra](https://cassandra.apache.org) | ||
|
||
The supported backends for the **chunks store** are: | ||
|
||
* [Amazon DynamoDB](https://aws.amazon.com/dynamodb) | ||
* [Google Bigtable](https://cloud.google.com/bigtable) | ||
* [Apache Cassandra](https://cassandra.apache.org) | ||
* [Amazon S3](https://aws.amazon.com/s3) | ||
* [Google Cloud Storage](https://cloud.google.com/storage/) | ||
* [Microsoft Azure Storage](https://azure.microsoft.com/en-us/services/storage/) | ||
|
||
## Storage versioning | ||
|
||
The chunks storage is based on a custom data format. The **chunks and index format are versioned**: this allows Cortex operators to upgrade the cluster to take advantage of new features and improvements. This strategy enables changes in the storage format without requiring any downtime or complex procedures to rewrite the stored data. A set of schemas are used to map the version while reading and writing time series belonging to a specific period of time. | ||
|
||
The current schema recommendation is the **v9 schema** for most use cases and **v10 schema** if you expect to have very high cardinality metrics (v11 is still experimental). For more information about the schema, please check out the [schema configuration](schema-config.md). | ||
|
||
## Guides | ||
|
||
The following step-by-step guides can help you setting up Cortex running with the chunks storage: | ||
|
||
- [Running Cortex chunks storage in Production](../guides/running-chunks-storage-in-production.md) | ||
- [Running Cortex chunks storage with Cassandra](../guides/running-chunks-storage-with-cassandra.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
--- | ||
title: "Schema Configuration" | ||
linkTitle: "Schema Configuration" | ||
weight: 2 | ||
slug: schema-configuration | ||
--- | ||
|
||
Cortex chunks storage stores indexes and chunks in table-based data storages. When such a storage type is used, multiple tables are created over the time: each table - also called periodic table - contains the data for a specific time range. The table-based storage layout is configured through a configuration file called **schema config**. | ||
|
||
_The schema config is used only by the chunks storage, while it's **not** used by the [blocks storage](../blocks-storage/_index.md) engine._ | ||
|
||
## Design | ||
|
||
The table based design brings two main benefits: | ||
|
||
1. **Schema config changes**<br /> | ||
Each table is bounded to a schema config and version, so that changes can be introduced over the time and multiple schema configs can coexist. | ||
2. **Retention**<br /> | ||
The retention is implemented deleting an entire table, which allows to have fast delete operations. | ||
|
||
The [**table-manager**](./table-manager.md) is the Cortex service responsible for creating a periodic table before its time period begins, and deleting it once its data time range exceeds the retention period. | ||
|
||
## Periodic tables | ||
|
||
A periodic table stores the index or chunks relative to a specific period of time. The duration of the time range of the data stored in a single table and its storage type is configured in the `configs` block of the [schema config](#schema-config) file. | ||
|
||
The `configs` block can contain multiple entries. Each config defines the storage used between the day set in `from` (in the format `yyyy-mm-dd`) and the next config, or "now" in the case of the last schema config entry. | ||
|
||
This allows to have multiple non-overlapping schema configs over the time, in order to perform schema version upgrades or change storage settings (including changing the storage type). | ||
|
||
![Schema config - periodic table](/images/chunks-storage/schema-config-periodic-tables.png) | ||
<!-- Diagram source at https://docs.google.com/presentation/d/1bHp8_zcoWCYoNU2AhO2lSagQyuIrghkCncViSqn14cU/edit --> | ||
|
||
The write path hits the table where the sample timestamp falls into (usually the last table, except short periods close to the end of a table and the beginning of the next one), while the read path hits the tables containing data for the query time range. | ||
|
||
## Schema versioning | ||
|
||
Cortex supports multiple schema version (currently there are 11) but we recommend running with the **v9 schema** for most use cases and **v10 schema** if you expect to have very high cardinality metrics. You can move from one schema to another if a new schema fits your purpose better, but you still need to configure Cortex to make sure it can read the old data in the old schemas. | ||
|
||
## Schema config | ||
|
||
The path to the schema config YAML file can be specified to Cortex via the CLI flag `-schema-config-file` and has the following structure. | ||
|
||
```yaml | ||
configs: []<period_config> | ||
``` | ||
### `<period_config>` | ||
|
||
The `period_config` configures a single period during which the storage is using a specific schema version and backend storage. | ||
|
||
```yaml | ||
# The starting date in YYYY-MM-DD format (eg. 2020-03-01). | ||
from: <string> | ||
# The key-value store to use for the index. Supported values are: | ||
# aws-dynamo, bigtable, bigtable-hashed, cassandra, boltdb. | ||
store: <string> | ||
# The object store to use for the chunks. Supported values are: | ||
# s3, aws-dynamo, bigtable, bigtable-hashed, gcs, cassandra, filesystem. | ||
# If none is specified, "store" is used for storing chunks as well. | ||
[object_store: <string>] | ||
# The schema version to use. Supported versions are: v1, v2, v3, v4, v5, | ||
# v6, v9, v10, v11. We recommended v9 for most use cases, alternatively | ||
# v10 if you expect to have very high cardinality metrics. | ||
schema: <string> | ||
index: <periodic_table_config> | ||
chunks: <periodic_table_config> | ||
``` | ||
|
||
### `periodic_table_config` | ||
|
||
The `periodic_table_config` configures the tables for a single period. | ||
|
||
```yaml | ||
# The prefix to use for the table names. | ||
prefix: <string> | ||
# The duration for each table. A new table is created every "period", which also | ||
# represents the granularity with which retention is enforced. Typically this value | ||
#is set to 1w (1 week). Must be a multiple of 24h. | ||
period: <duration> | ||
# The tags to be set on the created table. | ||
tags: <map[string]string> | ||
``` | ||
|
||
## Schema config example | ||
|
||
The following example shows an advanced schema file covering different changes over the course of a long period. It starts with v9 and just Bigtable. Later it was migrated to GCS as the object store, and finally moved to v10. | ||
|
||
_This is a complex schema file showing several changes changes over the time, while a typical schema config file usually has just one or two schema versions._ | ||
|
||
``` | ||
configs: | ||
# Starting from 2018-08-23 Cortex should store chunks and indexes | ||
# on Google BigTable using weekly periodic tables. The chunks table | ||
# names will be prefixed with "dev_chunks_", while index tables will be | ||
# prefixed with "dev_index_". | ||
- from: "2018-08-23" | ||
schema: v9 | ||
chunks: | ||
period: 1w | ||
prefix: dev_chunks_ | ||
index: | ||
period: 1w | ||
prefix: dev_index_ | ||
store: gcp-columnkey | ||
|
||
# Starting 2019-02-13 we moved from BigTable to GCS for storing the chunks. | ||
- from: "2019-02-13" | ||
schema: v9 | ||
chunks: | ||
period: 1w | ||
prefix: dev_chunks_ | ||
index: | ||
period: 1w | ||
prefix: dev_index_ | ||
object_store: gcs | ||
store: gcp-columnkey | ||
|
||
# Starting 2019-02-24 we moved our index from bigtable-columnkey to bigtable-hashed | ||
# which improves the distribution of keys. | ||
- from: "2019-02-24" | ||
schema: v9 | ||
chunks: | ||
period: 1w | ||
prefix: dev_chunks_ | ||
index: | ||
period: 1w | ||
prefix: dev_index_ | ||
object_store: gcs | ||
store: bigtable-hashed | ||
|
||
# Starting 2019-03-05 we moved from v9 schema to v10 schema. | ||
- from: "2019-03-05" | ||
schema: v10 | ||
chunks: | ||
period: 1w | ||
prefix: dev_chunks_ | ||
index: | ||
period: 1w | ||
prefix: dev_index_ | ||
object_store: gcs | ||
store: bigtable-hashed | ||
``` |
Oops, something went wrong.