This repository has been archived by the owner on Oct 14, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #39 from janhq/feat/add-crash-report-docs
docs: Add crash report architecture and implementation
- Loading branch information
Showing
6 changed files
with
271 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,270 @@ | ||
--- | ||
title: Telemetry | ||
description: Telemetry Architecture | ||
slug: "telemetry" | ||
--- | ||
|
||
|
||
import Tabs from '@theme/Tabs'; | ||
import TabItem from '@theme/TabItem'; | ||
|
||
|
||
## Introduction | ||
We collect anonymous usage data to enhance our product development. | ||
|
||
:::info | ||
We do not collect any sensitive or personal information. | ||
::: | ||
## Architecture | ||
![cortex-architecture.png](/img/docs/telemetry1.png) | ||
The telemetry system comprises several key components, each designed to ensure the Telemetry features work correctly and function accurately, guaranteeing the precision of the collected data. | ||
1. **Telemetry Collector**: This component gathers telemetry data from the Cortex and exports it to other observability tools for further analysis and monitoring. | ||
2. **Jan Telemetry Server**: This component ingests telemetry data from Cortex. It processes this data and forwards events to Google Logging for further handling. | ||
3. **Google Logging**: This component receives events from the Jan Telemetry Server. It stores and manages these logs and exports them to BigQuery. | ||
4. **Google BigQuery**: This component stores the exported logs from Google Logging. | ||
5. **Metabase**: This component connects to Google BigQuery to perform queries and visualize the data. It provides an interface for analyzing and presenting the telemetry data. | ||
|
||
## Collected Telemetry Metrics | ||
The collected telemetry metrics for Cortex are divided into two main categories: `CrashReportResource` and `CrashReportPayload`. | ||
<Tabs> | ||
<TabItem value="1" label="`CrashReportResource`" default> | ||
This category includes metrics related to the operating system and hardware details. It captures the following metrics: | ||
|
||
| Metric | Description | | ||
|-----------------|--------------------------------| | ||
| `osName` | Name of the operating system. | | ||
| `osVersion` | Version of the operating system.| | ||
| `architecture` | Architecture of the operating system. | | ||
| `appVersion` | Version of Cortex. | | ||
| `service.name` | Name of the Cortex service. | | ||
| `source` | Source of the report (`cli`, `cortex-server`, or `cortex-cpp`). | | ||
| `cpu` | Model of the CPU. | | ||
| `gpus` | Model of the GPU, vendor, VRAM, and VRAM dynamic. | | ||
</TabItem> | ||
<TabItem value="2" label="`CrashReportPayload`" default> | ||
This category focuses on metrics related to specific operations within Cortex. It captures the following metrics: | ||
|
||
| Metric | Description | | ||
|-------------|--------------------------------| | ||
| `modelId` | ID of the currently used model. | | ||
| `endpoint` | Endpoint of the Cortex Server. | | ||
| `command` | Command executed by Cortex CLI. | | ||
</TabItem> | ||
</Tabs> | ||
|
||
### Crash Report Schema | ||
```json | ||
{ | ||
"$schema": "http://json-schema.org/draft-07/schema#", | ||
"definitions": { | ||
"StringValue": { | ||
"type": "object", | ||
"properties": { | ||
"stringValue": { | ||
"type": "string" | ||
} | ||
}, | ||
"required": ["stringValue"] | ||
}, | ||
"ObjectValue": { | ||
"type": "object", | ||
"properties": { | ||
"kvlist_value": { | ||
"type": "object", | ||
"properties": { | ||
"values": { | ||
"type": "array", | ||
"items": { | ||
"type": "object", | ||
"properties": { | ||
"key": { | ||
"type": "string" | ||
}, | ||
"value": { | ||
"$ref": "#/definitions/StringValue" | ||
} | ||
}, | ||
"required": ["key", "value"] | ||
} | ||
} | ||
}, | ||
"required": ["values"] | ||
} | ||
}, | ||
"required": ["kvlist_value"] | ||
}, | ||
"Attribute": { | ||
"type": "object", | ||
"properties": { | ||
"key": { | ||
"type": "string" | ||
}, | ||
"value": { | ||
"oneOf": [ | ||
{ "$ref": "#/definitions/StringValue" }, | ||
{ "$ref": "#/definitions/ObjectValue" } | ||
] | ||
} | ||
}, | ||
"required": ["key", "value"] | ||
}, | ||
"TelemetryLog": { | ||
"type": "object", | ||
"properties": { | ||
"traceId": { | ||
"type": "string" | ||
}, | ||
"startTimeUnixNano": { | ||
"type": "string" | ||
}, | ||
"endTimeUnixNano": { | ||
"type": "string" | ||
}, | ||
"severityText": { | ||
"type": "string" | ||
}, | ||
"body": { | ||
"type": "object", | ||
"properties": { | ||
"stringValue": { | ||
"type": "string" | ||
} | ||
}, | ||
"required": ["stringValue"] | ||
}, | ||
"attributes": { | ||
"type": "array", | ||
"items": { "$ref": "#/definitions/Attribute" } | ||
} | ||
}, | ||
"required": ["traceId", "severityText", "body", "attributes"] | ||
}, | ||
"ScopeLog": { | ||
"type": "object", | ||
"properties": { | ||
"scope": { | ||
"type": "object" | ||
}, | ||
"logRecords": { | ||
"type": "array", | ||
"items": { "$ref": "#/definitions/TelemetryLog" } | ||
} | ||
}, | ||
"required": ["scope", "logRecords"] | ||
}, | ||
"Resource": { | ||
"type": "object", | ||
"properties": { | ||
"attributes": { | ||
"type": "array", | ||
"items": { "$ref": "#/definitions/Attribute" } | ||
} | ||
}, | ||
"required": ["attributes"] | ||
}, | ||
"TelemetryEvent": { | ||
"type": "object", | ||
"properties": { | ||
"resource": { "$ref": "#/definitions/Resource" }, | ||
"scopeLogs": { | ||
"type": "array", | ||
"items": { "$ref": "#/definitions/ScopeLog" } | ||
} | ||
}, | ||
"required": ["resource", "scopeLogs"] | ||
} | ||
}, | ||
"type": "object", | ||
"properties": { | ||
"resourceLogs": { | ||
"type": "array", | ||
"items": { "$ref": "#/definitions/TelemetryEvent" } | ||
} | ||
}, | ||
"required": ["resourceLogs"] | ||
} | ||
``` | ||
## Implementation | ||
![catch-crash.png](/img/docs/telemetry2.png) | ||
The diagram illustrates the implementation of handling crashes in the Cortex server environment: | ||
|
||
1. **User Interaction** | ||
- The user sends a command or requests to the Cortex Server Main Process. | ||
2. **Cortex Server Main Process** | ||
- The main process handles the initial command or request. | ||
- It forks a child process to monitor and handle errors. | ||
- To manage unexpected errors, the main process catches any `uncaughtException` or `unhandledRejection` events. | ||
3. **Child Process** | ||
- A child process is forked to perform periodic checks (interval pings) to ensure the main process is not hanging or unresponsive. | ||
- This child process can also capture and log errors. | ||
4. **Local Directory** | ||
- If a crash is detected, the error details and crash reports are inserted into a local directory for further analysis and record-keeping. | ||
|
||
|
||
## Configuration | ||
You can also opt out of the Telemetry feature or export it using any OpenTelemetry Collector for your application. | ||
### Turn Off Telemetry | ||
You can turn off our Telemetry feature by running the following command: | ||
<Tabs> | ||
<TabItem value="linux" label="Linux" default> | ||
```bash | ||
## 1. First Command | ||
echo "export CORTEX_CRASH_REPORT=0" >>~/.profile | ||
## 2. Second Command | ||
source ~/.profile | ||
``` | ||
</TabItem> | ||
<TabItem value="mac" label="macOS" default> | ||
```bash | ||
## 1. First Command | ||
echo "export CORTEX_CRASH_REPORT=0" >>~/.profile | ||
## 2. Second Command | ||
source ~/.profile | ||
``` | ||
</TabItem> | ||
<TabItem value="windows" label="Windows" default> | ||
```bash | ||
## Command Prompt | ||
set CORTEX_CRASH_REPORT 0 | ||
## PowerShell | ||
$env:CORTEX_CRASH_REPORT 0 | ||
## To set the env permanently | ||
setx CORTEX_CRASH_REPORT 0 | ||
``` | ||
</TabItem> | ||
</Tabs> | ||
### Export to Otel Collector | ||
You can also export our telemetry to your own Otel Collector setup by running the following command: | ||
<Tabs> | ||
<TabItem value="linux" label="Linux" default> | ||
```bash | ||
## 1. First Command | ||
echo "export OTEL_EXPORTER_OTLP_ENDPOINT=<your_OTEL_endpoint>" >>~/.profile | ||
## 2. Second Command | ||
source ~/.profile | ||
``` | ||
</TabItem> | ||
<TabItem value="mac" label="macOS" default> | ||
```bash | ||
## 1. First Command | ||
echo "export OTEL_EXPORTER_OTLP_ENDPOINT=<your_OTEL_endpoint>" >>~/.profile | ||
## 2. Second Command | ||
source ~/.profile | ||
``` | ||
</TabItem> | ||
<TabItem value="windows" label="Windows" default> | ||
```bash | ||
## Command Prompt | ||
set OTEL_EXPORTER_OTLP_ENDPOINT <your_OTEL_endpoint> | ||
## PowerShell | ||
$env:OTEL_EXPORTER_OTLP_ENDPOINT <your_OTEL_endpoint> | ||
## To set the env permanently | ||
setx OTEL_EXPORTER_OTLP_ENDPOINT <your_OTEL_endpoint> | ||
``` | ||
</TabItem> | ||
</Tabs> | ||
|
||
:::info | ||
Learn more about Telemetry: | ||
- [Telemetry CLI command](/docs/cli/telemetry). | ||
::: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.