Skip to content

Commit

Permalink
change json format
Browse files Browse the repository at this point in the history
  • Loading branch information
marvin-j97 committed Mar 11, 2024
1 parent 9cad823 commit d84ba14
Show file tree
Hide file tree
Showing 25 changed files with 578 additions and 512 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@ Each row can have a different set of columns (schema-less). The table is sparse,

In Bigtable, stored values are byte blobs; Smoltable supports multiple data types out of the box:

- String (UTF-8 encoded string)
- Boolean (like Byte, but is unmarshalled as boolean)
- Byte (unsigned integer, 1 byte)
- I32 (signed integer, 4 bytes)
- I64 (signed integer, 8 bytes)
- F32 (floating point, 4 bytes)
- F64 (floating point, 8 bytes)
- string (UTF-8 encoded string)
- boolean (like Byte, but is unmarshalled as boolean)
- byte (unsigned integer, 1 byte)
- i32 (signed integer, 4 bytes)
- i64 (signed integer, 8 bytes)
- f32 (floating point, 4 bytes)
- f64 (floating point, 8 bytes)

Column families can be grouped into locality groups, which partition groups of column families into separate LSM-trees, increasing scan performance over those column families (e.g. OLAP-style queries over a specific column).

Expand Down
42 changes: 17 additions & 25 deletions docs/src/content/docs/guides/locality-groups.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ If we need to read columns of a specific column family for many rows (using a co

Consider the [`webtable` example](/smoltable/guides/wide-column-intro/#real-life-example-webtable):

If we wanted to get the language of all com.* pages, we would need to scan following column families:
If we wanted to get the language of all com.\* pages, we would need to scan following column families:

- `anchor`, which can be a very wide column family
- `language`
- `contents`, which is always huge because it stores raw HTML

`language` is just 2 bytes (alpha2 country code, e.g. **DE**, **EN**, ...), but every row may require multiple kilobytes of data to be retrieved to get just the language. This heavily decreases read throughput of OLAP-style scans of large ranges.

To combat this, we can define a *locality group*, which can house multiple column families. Each locality group is stored in its own LSM-tree (a single partition inside the storage engine), but row mutations across column families stay atomic.
To combat this, we can define a _locality group_, which can house multiple column families. Each locality group is stored in its own LSM-tree (a single partition inside the storage engine), but row mutations across column families stay atomic.

![Webtable locality groups](/smoltable/webtable-locality.png)

Expand Down Expand Up @@ -125,15 +125,13 @@ curl --request POST \
"cells": [
{
"column_key": "title:",
"value": {
"String": "Apache Spark™ - Unified Engine for large-scale data analytics"
}
"type": "string",
"value": "Apache Spark™ - Unified Engine for large-scale data analytics"
},
{
"column_key": "language:",
"value": {
"String": "EN"
}
"type": "string",
"value": "EN"
}
]
},
Expand All @@ -142,15 +140,13 @@ curl --request POST \
"cells": [
{
"column_key": "title:",
"value": {
"String": "Welcome to Apache Solr - Apache Solr"
}
"type": "string",
"value": "Welcome to Apache Solr - Apache Solr"
},
{
"column_key": "language:",
"value": {
"String": "EN"
}
"type": "string",
"value": "EN"
}
]
}
Expand Down Expand Up @@ -195,10 +191,9 @@ Smoltable returns (again, body truncated for brevity):
"title": {
"": [
{
"timestamp": 1706197595375136143,
"value": {
"String": "Apache Cassandra | Apache Cassandra Documentation"
}
"time": 1706197595375136143,
"type": "string",
"value": "Apache Cassandra | Apache Cassandra Documentation"
}
]
}
Expand Down Expand Up @@ -284,9 +279,7 @@ By listing our table, we can see the column families have been created, and `tit
"disk_space_in_bytes": 0,
"locality_groups": [
{
"column_families": [
"title"
],
"column_families": ["title"],
"id": "ur_pSQZ2QAYR6XsF9Xz0o"
}
],
Expand Down Expand Up @@ -354,10 +347,9 @@ which returns (truncated):
"title": {
"": [
{
"timestamp": 1706198298766257607,
"value": {
"String": "Apache Cassandra | Apache Cassandra Documentation"
}
"time": 1706198298766257607,
"type": "string",
"value": "Apache Cassandra | Apache Cassandra Documentation"
}
]
}
Expand Down
14 changes: 7 additions & 7 deletions docs/src/content/docs/guides/wide-column-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ Each row’s cells are sorted by the column key (family + qualifier), and a time
which maps to some value, the `cell value`. The cell value, unlike in Bigtable, can be a certain type:

- String (UTF-8 encoded string)
- Boolean (like Byte, but is unmarshalled as boolean)
- Byte (unsigned integer, 1 byte)
- I32 (signed integer, 4 bytes)
- I64 (signed integer, 8 bytes)
- F32 (floating point, 4 bytes)
- F64 (floating point, 8 bytes)
- string (UTF-8 encoded string)
- boolean (like Byte, but is unmarshalled as boolean)
- byte (unsigned integer, 1 byte)
- i32 (signed integer, 4 bytes)
- i64 (signed integer, 8 bytes)
- f32 (floating point, 4 bytes)
- f64 (floating point, 8 bytes)

The timestamp allows storing multiple versions of the same cell.

Expand Down
67 changes: 32 additions & 35 deletions docs/src/content/docs/reference/json-api/ingest-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,47 +11,44 @@ POST http://smoltable:9876/v1/table/[name]/write

```json
{
"items": [
{
"row_key": "org.apache.spark",
"cells": [
{
"column_key": "title:",
"value": {
"String": "Apache Spark™ - Unified Engine for large-scale data analytics"
}
},
{
"column_key": "anchor:org.apache.hbase",
"value": {
"String": "Visit Apache Spark"
}
},
"items": [
{
"row_key": "org.apache.spark",
"cells": [
{
"column_key": "meta:size",
"value": {
"I64": 152014
}
},
]
}
]
"column_key": "title:",
"type": "string",
"value": "Apache Spark™ - Unified Engine for large-scale data analytics"
},
{
"column_key": "anchor:org.apache.hbase",
"type": "string",
"value": "Visit Apache Spark"
},
{
"column_key": "meta:size",
"type": "i64",
"value": 152014
}
]
}
]
}
```

### Example response

```json
{
"message": "Data ingestion successful",
"result": {
"items": {
"cell_count": 3,
"row_count": 1
},
"micros_per_item": 5
},
"status": 200,
"time_ms": 0
"message": "Data ingestion successful",
"result": {
"items": {
"cell_count": 3,
"row_count": 1
},
"micros_per_item": 5
},
"status": 200,
"time_ms": 0
}
```
```
Loading

0 comments on commit d84ba14

Please sign in to comment.