Skip to content
This repository has been archived by the owner on Mar 27, 2021. It is now read-only.

Commit

Permalink
add docs on BT schema/rowkey (#669)
Browse files Browse the repository at this point in the history
* add docs on BT schema/rowkey

* color code!
  • Loading branch information
sjoeboo authored Jul 8, 2020
1 parent 037acd4 commit 436e7fd
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 1 deletion.
4 changes: 3 additions & 1 deletion docs/_layouts/sidebar.html
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@
<li {% if page.title == 'Aggregations' %}class="active"{% endif %}>
<a href="{{ 'docs/aggregations' | relative_url }}">Aggregations</a>
</li>

<li {% if page.title == 'BigTable' %}class="active"{% endif %}>
<a href="{{ 'docs/bigtable' | relative_url }}">BigTable</a>
</li>
<li {% if page.title == 'Heroic Shell' %}class="active"{% endif %}>
<a href="{{ 'docs/shell' | relative_url }}">Shell</a>
</li>
Expand Down
95 changes: 95 additions & 0 deletions docs/content/_docs/bigtable.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: BigTable
---
# BigTable

Some good to know points about how Heroic uses Bigtable

## Row Key composition/schema

Heroic stores time series in BigTable in a schema optimized for retrieving data for the purpose of dashboarding, for example retrieving and aggregating the CPU load on all machines of a backend service in a particular region. This is achieved by storing tags as part of the row keys in BigTable.

Heroic’s BigTable schema stores all metrics with the same key, tags, and resource identifiers within roughly 50 days (4294967296 milliseconds) in the same row. All metrics within those 50 days are stored as separate cells of data, with the time period within the 50 days as column key (or column qualifier), and the measurement as data.

This way, Heroic avoids having a single row for *all* data points in a time series from the beginning of time until the heat death of the universe.


Given the two datapoints for an example timeseries:

```json
{
"series": {
"key": "system",
"tags": {
"site": "gew",
"what": "cpu-idle-percentage",
"system-component": "cpu",
"cpu-type": "idle",
"unit": "%"
},
"resource": {
"podname": "pod-example-123-abc",
"host": "database.example.com"
}
},
"data": {
"type": "points",
"data": [[1300000000000, 42.0]]
}
}
```

and

```json
{
"series": {
"key": "system",
"tags": {
"site": "gew",
"what": "cpu-idle-percentage",
"system-component": "cpu",
"cpu-type": "idle",
"unit": "%"
},
"resource": {
"podname": "pod-example-123-abc",
"host": "database.example.com"
}
},
"data": {
"type": "points",
"data": [[1300001000000, 84.0]]
}
}

```

In order to retain the exact timestamp, Heroic splits the original timestamp into the base-timestamp (to be used for the row key), and the delta, or offset timestamp (to be used in the column key, or qualifier). When queried, the sum of the base-timestamp in the row key and the delta timestamp in the column qualifier becomes the exact timestamp.

To illustrate this, using the two metrics in the example above, Heroic calculates the base-timestamp and the delta timestamp as follows:



Base-timestamp first metric: <span style="background-color:powderblue; color:black">1300000000000</span> - (<span style="background-color:powderblue; color:black">1300000000000</span> % <span style="background-color:lightcyan; color:black">4294967296</span> ) = <span style="background-color:blue; color:black">1297080123392</span>
Timestamp delta first metric: <span style="background-color:powderblue; color:black">1300000000000</span> - <span style="background-color:blue; color:black">1297080123392</span> = <span style="background-color:grey; color:black">2919876608</span>

Base-timestamp second metric: <span style="background-color:powderblue; color:black">1300001000000</span> - ( <span style="background-color:powderblue; color:black">1300001000000</span> % <span style="background-color:lightcyan; color:black">4294967296</span> ) = <span style="background-color:blue; color:black">1297080123392</span>
Timestamp delta second metric: <span style="background-color:powderblue; color:black">1300001000000</span> - <span style="background-color:blue; color:black">1297080123392</span> = <span style="background-color:grey; color:black">2920876608</span>



Together with all tags and resources, sorted lexicographically, this ends up creating this row-key:

<span style="background-color:red; color:black">system</span>,<span style="background-color:yellow; color:black">cpu-type=idle,site=gew,system-component=cpu,unit=%,what=cpu-idle-percentage</span>,<span style="background-color:blue; color:black">1297080123392</span>,<span style="background-color:green; color:black">database.example.com,pod-example-123-abc</span>


And the two metrics sent 16 minutes apart are stored in the BigTable table like so:


| Row key | <span style="background-color:grey; color:black">2919876608</span> | <span style="background-color:grey; color:black">2920876608</span> |
|------------------------------------------------------------------------------------------------------------------------------------------- |------------ |------------ |
| <span style="background-color:red; color:black">system</span>,<span style="background-color:yellow; color:black">cpu-type=idle,site=gew,system-component=cpu,unit=%,what=cpu-idle-percentage</span>,<span style="background-color:blue; color:black">1297080123392</span>,<span style="background-color:green; color:black">database.example.com,pod-example-123-abc</span> | <span style="background-color:pink; color:black">42.0</span> | <span style="background-color:pink; color:black">84.0</span> |


***

0 comments on commit 436e7fd

Please sign in to comment.