Skip to content
This repository has been archived by the owner on Oct 14, 2024. It is now read-only.

docs: Add crash report architecture and implementation #39

Merged
merged 10 commits into from
Jun 20, 2024
270 changes: 270 additions & 0 deletions docs/telemetry.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
---
title: Telemetry
description: Telemetry Architecture
slug: "telemetry"
---


import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';


## Introduction
We collect anonymous usage data to enhance our product development.

:::info
We do not collect any sensitive or personal information.
:::
## Architecture
![cortex-architecture.png](/img/docs/telemetry1.png)
The telemetry system comprises several key components, each designed to ensure the Telemetry features work correctly and function accurately, guaranteeing the precision of the collected data.
1. **Telemetry Collector**: This component gathers telemetry data from the Cortex and exports it to other observability tools for further analysis and monitoring.
2. **Jan Telemetry Server**: This component ingests telemetry data from Cortex. It processes this data and forwards events to Google Logging for further handling.
3. **Google Logging**: This component receives events from the Jan Telemetry Server. It stores and manages these logs and exports them to BigQuery.
4. **Google BigQuery**: This component stores the exported logs from Google Logging.
5. **Metabase**: This component connects to Google BigQuery to perform queries and visualize the data. It provides an interface for analyzing and presenting the telemetry data.

## Collected Telemetry Metrics
The collected telemetry metrics for Cortex are divided into two main categories: `CrashReportResource` and `CrashReportPayload`.
<Tabs>
<TabItem value="1" label="`CrashReportResource`" default>
This category includes metrics related to the operating system and hardware details. It captures the following metrics:

| Metric | Description |
|-----------------|--------------------------------|
| `osName` | Name of the operating system. |
| `osVersion` | Version of the operating system.|
| `architecture` | Architecture of the operating system. |
| `appVersion` | Version of Cortex. |
| `service.name` | Name of the Cortex service. |
| `source` | Source of the report (`cli`, `cortex-server`, or `cortex-cpp`). |
| `cpu` | Model of the CPU. |
| `gpus` | Model of the GPU, vendor, VRAM, and VRAM dynamic. |
</TabItem>
<TabItem value="2" label="`CrashReportPayload`" default>
This category focuses on metrics related to specific operations within Cortex. It captures the following metrics:

| Metric | Description |
|-------------|--------------------------------|
| `modelId` | ID of the currently used model. |
| `endpoint` | Endpoint of the Cortex Server. |
| `command` | Command executed by Cortex CLI. |
</TabItem>
</Tabs>

### Crash Report Schema
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"StringValue": {
"type": "object",
"properties": {
"stringValue": {
"type": "string"
}
},
"required": ["stringValue"]
},
"ObjectValue": {
"type": "object",
"properties": {
"kvlist_value": {
"type": "object",
"properties": {
"values": {
"type": "array",
"items": {
"type": "object",
"properties": {
"key": {
"type": "string"
},
"value": {
"$ref": "#/definitions/StringValue"
}
},
"required": ["key", "value"]
}
}
},
"required": ["values"]
}
},
"required": ["kvlist_value"]
},
"Attribute": {
"type": "object",
"properties": {
"key": {
"type": "string"
},
"value": {
"oneOf": [
{ "$ref": "#/definitions/StringValue" },
{ "$ref": "#/definitions/ObjectValue" }
]
}
},
"required": ["key", "value"]
},
"TelemetryLog": {
"type": "object",
"properties": {
"traceId": {
"type": "string"
},
"startTimeUnixNano": {
"type": "string"
},
"endTimeUnixNano": {
"type": "string"
},
"severityText": {
"type": "string"
},
"body": {
"type": "object",
"properties": {
"stringValue": {
"type": "string"
}
},
"required": ["stringValue"]
},
"attributes": {
"type": "array",
"items": { "$ref": "#/definitions/Attribute" }
}
},
"required": ["traceId", "severityText", "body", "attributes"]
},
"ScopeLog": {
"type": "object",
"properties": {
"scope": {
"type": "object"
},
"logRecords": {
"type": "array",
"items": { "$ref": "#/definitions/TelemetryLog" }
}
},
"required": ["scope", "logRecords"]
},
"Resource": {
"type": "object",
"properties": {
"attributes": {
"type": "array",
"items": { "$ref": "#/definitions/Attribute" }
}
},
"required": ["attributes"]
},
"TelemetryEvent": {
"type": "object",
"properties": {
"resource": { "$ref": "#/definitions/Resource" },
"scopeLogs": {
"type": "array",
"items": { "$ref": "#/definitions/ScopeLog" }
}
},
"required": ["resource", "scopeLogs"]
}
},
"type": "object",
"properties": {
"resourceLogs": {
"type": "array",
"items": { "$ref": "#/definitions/TelemetryEvent" }
}
},
"required": ["resourceLogs"]
}
```
## Implementation
![catch-crash.png](/img/docs/telemetry2.png)
The diagram illustrates the implementation of handling crashes in the Cortex server environment:

1. **User Interaction**
- The user sends a command or requests to the Cortex Server Main Process.
2. **Cortex Server Main Process**
- The main process handles the initial command or request.
- It forks a child process to monitor and handle errors.
- To manage unexpected errors, the main process catches any `uncaughtException` or `unhandledRejection` events.
3. **Child Process**
- A child process is forked to perform periodic checks (interval pings) to ensure the main process is not hanging or unresponsive.
- This child process can also capture and log errors.
4. **Local Directory**
- If a crash is detected, the error details and crash reports are inserted into a local directory for further analysis and record-keeping.


## Configuration
You can also opt out of the Telemetry feature or export it using any OpenTelemetry Collector for your application.
### Turn Off Telemetry
You can turn off our Telemetry feature by running the following command:
<Tabs>
<TabItem value="linux" label="Linux" default>
```bash
## 1. First Command
echo "export CORTEX_CRASH_REPORT=0" >>~/.profile
## 2. Second Command
source ~/.profile
```
</TabItem>
<TabItem value="mac" label="macOS" default>
```bash
## 1. First Command
echo "export CORTEX_CRASH_REPORT=0" >>~/.profile
## 2. Second Command
source ~/.profile
```
</TabItem>
<TabItem value="windows" label="Windows" default>
```bash
## Command Prompt
set CORTEX_CRASH_REPORT 0
## PowerShell
$env:CORTEX_CRASH_REPORT 0
## To set the env permanently
setx CORTEX_CRASH_REPORT 0
```
</TabItem>
</Tabs>
### Export to Otel Collector
You can also export our telemetry to your own Otel Collector setup by running the following command:
<Tabs>
<TabItem value="linux" label="Linux" default>
```bash
## 1. First Command
echo "export OTEL_EXPORTER_OTLP_ENDPOINT=<your_OTEL_endpoint>" >>~/.profile
## 2. Second Command
source ~/.profile
```
</TabItem>
<TabItem value="mac" label="macOS" default>
```bash
## 1. First Command
echo "export OTEL_EXPORTER_OTLP_ENDPOINT=<your_OTEL_endpoint>" >>~/.profile
## 2. Second Command
source ~/.profile
```
</TabItem>
<TabItem value="windows" label="Windows" default>
```bash
## Command Prompt
set OTEL_EXPORTER_OTLP_ENDPOINT <your_OTEL_endpoint>
## PowerShell
$env:OTEL_EXPORTER_OTLP_ENDPOINT <your_OTEL_endpoint>
## To set the env permanently
setx OTEL_EXPORTER_OTLP_ENDPOINT <your_OTEL_endpoint>
```
</TabItem>
</Tabs>

:::info
Learn more about Telemetry:
- [Telemetry CLI command](/docs/cli/telemetry).
:::
1 change: 1 addition & 0 deletions sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ const sidebars: SidebarsConfig = {
{ type: "doc", id: "architecture", label: "Cortex" },
{ type: "doc", id: "cortex-cpp", label: "Cortex.cpp" },
{ type: "doc", id: "cortex-llamacpp", label: "Cortex.llamacpp" },
{ type: "doc", id: "telemetry", label: "Telemetry" },
],
cli: [
{ type: "doc", id: "cli/cortex", label: "cortex" },
Expand Down
Binary file added static/img/docs/catch-crash.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/docs/telemetry-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/docs/telemetry1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/docs/telemetry2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading