See aws-toolkit-common/telemetry for full details about defining telemetry metrics.
- You can define new metrics during development by adding items to
telemetry/vscodeTelemetry.json.
- The
generateClients
build task generates new symbols inshared/telemetry/telemetry
, which you can import via:import { telemetry } from '/shared/telemetry/telemetry'
- When your feature is released, the "development" metrics you defined in
vscodeTelemetry.json
should be upstreamed to aws-toolkit-common.
- The
- Metrics are dropped (not posted to the service) if the extension is running in CI or other
automation tasks.
- You can always test telemetry via assertTelemetry(), regardless of the current environment.
- Use
run()
where possible. It automatically sets theresult
andreason
fields. See below for details.run()
gets thereason
value from theError.code
property of any exception that is thrown.- Your code can throw
ToolkitError
with acode
field to communicate errors, validation issues, or cancellation. See below.
- The
reason
andresult
fields are standard metric fields shared by all Toolkits (VSCode, JetBrains, VisualStudio). They should be used instead of special-purpose metrics or fields. result
allows the Toolkits team to monitor all features for potential regressions.reason
gives insight into the cause of aresult=Failed
metric.telemetry.record()
called during atelemetry.foo.run(…)
context will automatically annotate the currentfoo
metric.- For example, the cloudwatch logs feature adds
hasTimeFilter
info its metrics by calling telemetry.record().
- For example, the cloudwatch logs feature adds
User actions or other features may have multiple stages/steps, called a "workflow" or just "flow". A telemetry "trace" captures a flow as tree of "spans".
For example, setupThing()
has multiple steps until it is completed, ending with lastSetupStep()
.
function setupThing() {
setupStep1()
setupStep2()
...
lastSetupStep()
}
If we want to send a metric event, lets call it metric_setupThing
, then the code could look something like this:
function setupThing() {
try {
...
lastSetupStep()
telemetry.metric_setupThing.emit({result: 'Succeeded', ...})
}
catch (e) {
telemetry.metric_setupThing.emit({result: 'Failed', reason: 'Not Really Sure Why' ...})
}
}
Here we emitted a final metric based on the failure or success of the entire execution. Each metric is discrete and immediately gets sent to the telemetry service.
But usually code is not flat and there are many nested calls. If something goes wrong during the execution it would be useful to have more specific information at the area of failure. For that we can use run()
along with telemetry.record()
.
run()
accepts a callback, and when the callback is executed, any uses of telemetry.record()
at any nesting level during execution of that callback, will update the
attributes of the "current metric".
And at the end (that is, when run()
returns) we will emit a single metric with the last updated attributes.
Example
When an exception is thrown from a run()
context, run()
will automatically set
the reason
field based on the Error code
field. You can explicitly set code
when throwing
a ToolkitError
, for example:
throw new ToolkitError('No sso-session name found in ~/.aws/config', { code: 'NoSsoSessionName' })
Note: prefer reason codes with a format similar to existing codes (not sentences). You can find existing codes by searching the codebase:
git grep 'code: '
setupThing()
function setupThing() {
// Start the run() for metric_setupThing
telemetry.metric_setupThing.run(span => {
// Update the metric with initial attributes
span.record({sessionId: '123456'}) // now no matter where the control flow exits after this line in this method, this attribute will always be set
...
setupStep2()
...
if (userInput.CancelSelected) {
// By setting the `cancelled` attribute to true, the `result` attribute will be set to Cancelled
throw new ToolkitError("Thing has been cancelled", { cancelled: true})
}
})
// At this point the final values from the `record()` calls are used to emit a the final metric.
// If no exceptions have been thrown, the `result` attribute is automatically set to Success.
}
function setupStep2() {
try {
// Do work
}
catch (e) {
// Here we can update the metric with more specific information regarding the failure.
// Also notice we are able to use `telemetry.metric_setupThing` versus `span`.
// This is due to `metric_setupThing` being added to the "context" from the above run()
// callback argument. So when we use record() below it will update the same
// thing that span.record() does.
// Keep in mind record() must be run inside the callback argument of run() for
// the attributes of that specific metric to be updated.
telemetry.metric_setupThing.record({
workDone: // ...
})
// If this exception is allowed to propogate to the `run()`, then the `result` will be automatically set to Failed and the `reason` to the `code` set here
throw new ToolkitError(e as Error, { code: "SomethingWentWrongInStep2"})
}
}
Finally, if setupStep2()
was the thing that failed we would see a metric like:
{
"metadata.metricName": "metric_setupThing",
"sessionId": "123456",
"result": "Failed",
"reason": "SomethingWentWrongInStep2",
...
}
When errors are thrown we do not attach the stack trace in telemetry. We only know about the error itself, but not the path it took to get there. We sometimes need this stack trace to debug, and only have telemetry to get insight on what happened since we do not have access to logs.
Common example: "I have a function, thisFailsSometimes()
that is called in multiple places. The function sometimes fails, I know from telemetry, but I do not know if it is failing when it is a specific caller. If I knew the call stack/trace that it took to call my function that would help me debug."
function runsSuccessfully() {
thisFailsSometimes(1) // this succeeds
}
function thisThrows() {
thisFailsSometimes(0) // this fails
}
function failsDependingOnInput(num: number) {
return telemetry.my_Metric.run(() => {
if (number === 0) {
throw Error('Cannot be 0')
}
...
})
}
On class methods, use the @withTelemetryContext()
decorator to add context to the execution. Depending on the args set, it provides features like emitting the result, or adding it's context to errors.
NOTE: Decorators are currently only supported for methods and not functions
class MyClass {
@withTelemetryContext({ name: 'runsSuccessfully', class: 'MyClass' })
public runsSuccessfully() {
failsDependingOnInput(1)
}
@withTelemetryContext({ name: 'thisThrows', class: 'MyClass', errorCtx: true })
public thisThrows() {
failsDependingOnInput(0)
}
@withTelemetryContext({ name: 'failsDependingOnInput' class: 'MyClass', emit: true, errorCtx: true})
private failsDependingOnInput(num: number) {
if (number === 0) {
throw Error('Cannot be 0')
}
...
}
}
// Results in a metric: { source: 'MyClass#thisThrows,failsDependingOnInput', result: 'Failed' }
// Results in an error that has context about the methods that lead up to it.
new MyClass().thisThrows()
Separately if you must use a function, add a value to function
in the options of a run()
. This will result in a stack of functions identifiers that were previously called
before failsDependingOnInput()
was run. You can then retrieve the stack in the run()
of your final metric using getFunctionStack()
.
function runsSuccessfully() {
telemetry.my_Metric.run(() => failsDependingOnInput(1), { functionId: { name: 'runsSuccessfully' }})
}
function thisThrows() {
telemetry.my_Metric.run(() => failsDependingOnInput(0), { functionId: { source: 'thisThrows' }})
}
function failsDependingOnInput(num: number) {
return telemetry.my_Metric.run(() => {
telemetry.record({ theCallStack: asStringifiedStack(telemetry.getFunctionStack())})
if (number === 0) {
throw Error('Cannot be 0')
}
...
}, { functionId: { name: 'failsDependingOnInput' }})
}
// Results in a metric: { theCallStack: 'thisThrows:failsDependingOnInput', result: 'Failed' }
// { theCallStack: 'thisThrows:failsDependingOnInput' } implies 'thisThrows' was run first, then 'failsDependingOnInput'. See docstrings for more info.
thisThrows()
-
If a nested function does not use a
run()
then it will not be part of the call stack.function a() { return telemetry.my_Metric.run(() => b(), { functionId: { name: 'a' } }) } function b() { return c() } function c() { return telemetry.my_Metric.run(() => asStringifiedStack(telemetry.getFunctionStack()), { functionId: { name: 'c' }, }) } c() // result: 'a:c', note that 'b' is not included
-
If you do not want your
run()
to emit telemetry, setemit: false
in the optionsfunction a() { return telemetry.my_Metric.run(() => b(), { functionId: { name: 'a' }, emit: false }) }
-
If you want to add to the function stack, but don't have a specific metric to use, use the metric named
function_call
. Also look to setemit: false
if you do not want it to emit telemetry.function a() { return telemetry.function_call.run(() => b(), { functionId: { name: 'a' }, emit: false }) }
-
If your function name is generic, look to make it unique so there is no confusion.
function a() { return telemetry.my_Metric.run(() => b(), { functionId: { name: 'aButMoreUnique' } }) }
All telemetry events include a traceId in addition to other attributes. Traceids allow for improved tracking and correlation of related events across a single operation or user flow.
A traceId is a unique identifier that is generated for the top-level telemetry event in a flow and then propagated to all subsequent related events. This allows us to group and analyze all events associated with a particular operation.
- When a top-level telemetry event is created (e.g.,
vscode_executeCommand
), a new traceId is generated. - This traceId is then attached to all subsequent related telemetry events that occur as part of the same operation or flow.
- The traceId remains consistent for all events within the same flow
Consider a flow where vscode_executeCommand
triggers amazonq_enterFocusChat
and amazonq_openChat
. The resulting telemetry events would look like this:
vscode_executeCommand:
traceId: 'aaaaa-aaaaa-aaaaa-aaaaa-aaaaa'
amazonq_enterFocusChat
traceId: 'aaaaa-aaaaa-aaaaa-aaaaa-aaaaa'
amazonq_openChat
traceId: 'aaaaa-aaaaa-aaaaa-aaaaa-aaaaa'
allowing us to look up traceId=aaaaa-aaaaa-aaaaa-aaaaa-aaaaa
in our telemetry instance and find all the related events.
For more information visit the OpenTelemetry documentation on traces: https://opentelemetry.io/docs/concepts/signals/traces/
In certain scenarios you may need to manually instrument disjoint flows to track how a traceId
propagates through them. e.g.
- Measuring the time it takes for a message to travel from Amazon Q chat, through VS Code, and back to the customer.
- Determining the duration for Amazon Q inline to display a message to the user.
In these cases, where there isn't a direct hierarchy of function calls, manual instrumentation of the traceId
is necessary.
If you're not within an active span and you know the traceId
you want to use:
telemetry.withTraceId(() => {
// Code to be executed within this trace
}, 'myTraceId')
This method wraps the provided function with the specified traceId
If you're already executing within a span (e.g., vscode_executeCommand) and you know the traceId you want to use:
telemetry.record({
traceId: 'myTraceId',
})
This approach records the traceId for the current span and all future spans within the same execution context.