Skip to content

Commit

Permalink
Fix #341: Optimization: Callbacks invocation (#1595)
Browse files Browse the repository at this point in the history
  • Loading branch information
jnpsk authored Sep 19, 2024
1 parent 9d42517 commit f2f3484
Show file tree
Hide file tree
Showing 31 changed files with 1,952 additions and 191 deletions.
67 changes: 59 additions & 8 deletions docs/Configuration-Properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,62 @@ Discuss its configuration with the [Spring Boot documentation](https://docs.spri

## Scheduled Jobs Configuration

| Property | Default | Note |
|-----------------------------------------------------------------------------|-----------|----------------------------------------------------------------------------------------------------|
| `powerauth.service.scheduled.job.operationCleanup` | `5000` | Time delay in milliseconds between two consecutive tasks that expire long pending operations. |
| `powerauth.service.scheduled.job.expireOperationsLimit` | `100` | Number of long pending operations that will be set expired in single scheduled job run. |
| `powerauth.service.scheduled.job.activationsCleanup` | `5000` | Time delay in milliseconds between two consecutive tasks that expire abandoned activations. |
| `powerauth.service.scheduled.job.activationsCleanup.lookBackInMilliseconds` | `3600000` | Number of milliseconds to look back in the past when looking for abandoned activations. |
| `powerauth.service.scheduled.job.uniqueValueCleanup` | `60000` | Time delay in milliseconds between two consecutive tasks that delete expired unique values. |
| `powerauth.service.scheduled.job.fido2AuthenticatorCacheEviction` | `3600000` | Duration in milliseconds for which the internal cache holds details of FIDO2 Authenticator models. |
| Property | Default | Note |
|-----------------------------------------------------------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------|
| `powerauth.service.scheduled.job.operationCleanup` | `5000` | Time delay in milliseconds between two consecutive tasks that expire long pending operations. |
| `powerauth.service.scheduled.job.expireOperationsLimit` | `100` | Number of long pending operations that will be set expired in single scheduled job run. |
| `powerauth.service.scheduled.job.activationsCleanup` | `5000` | Time delay in milliseconds between two consecutive tasks that expire abandoned activations. |
| `powerauth.service.scheduled.job.activationsCleanup.lookBackInMilliseconds` | `3600000` | Number of milliseconds to look back in the past when looking for abandoned activations. |
| `powerauth.service.scheduled.job.uniqueValueCleanup` | `60000` | Time delay in milliseconds between two consecutive tasks that delete expired unique values. |
| `powerauth.service.scheduled.job.dispatchPendingCallbackUrlEvents` | `3000` | Time delay in milliseconds between two consecutive tasks that try to send pending callback events that could not be dispatched immediately. |
| `powerauth.service.scheduled.job.rerunStaleCallbackUrlEvents` | `3000` | Time delay in milliseconds between two consecutive tasks that rerun stale callback events that got stuck during their processing. |
| `powerauth.service.scheduled.job.callbackUrlEventsCleanupCron` | `0 0 0 */1 * *` | Cron schedule triggering a task to clean completed callback events after their retention period has expired. |
| `powerauth.service.scheduled.job.fido2AuthenticatorCacheEviction` | `3600000` | Duration in milliseconds for which the internal cache holds details of FIDO2 Authenticator models. |

## Callback URL Events Configuration

PowerAuth monitors status of operations and activations. When their status changes, configured callbacks are triggered.
The following properties allow you to configure the maximum number of attempts and the exponential backoff algorithm
for dispatching a callback event. The default values are set with respect to the behavior of previous PowerAuth version.
However, it is possible to override these defaults or configure each callback settings individually using the
Callback URL Management API.

In certain scenarios, repeatedly attempting to dispatch callback events may be pointless due to system failure on the
receiver's side. To address this, if multiple callback events with the same configuration fail consecutively, the
service temporarily halts further dispatch attempts and marks these events as failed without retrying. The number of
consecutive failures allowed before stopping dispatch is defined by the `failureThreshold` property, while the halt
period is configurable via the `resetTimeout` property. After this period, a callback dispatch attempt will be made again
to check the receiver's availability.

PowerAuth dispatches a callback as soon as a change in operation or activation status is detected. Each newly created
callback is passed to a configurable thread pool executor for dispatch. Even if the thread pool's queue is full, the
callback will eventually be dispatched. Keep in mind that dispatching a callback involves database operations.
Imbalanced settings of the thread pool size and database connection pool size can lead to system disruptions.

Callback events are periodically monitored to detect any stale callback events that might have become stuck during
processing due to rare circumstances. When a currently processed callback event exceeds the defined `forceRerunPeriod`
without completion, it is automatically scheduled to be rerun. By default, the force rerun period is calculated as the
sum of the HTTP connection timeout, the HTTP response timeout, and an additional ten-second delay. This does not apply
to callback events with max attempts set to 1, such callback events are never scheduled to be rerun.

| Property | Default | Note |
|---------------------------------------------------------------------|---------|--------------------------------------------------------------------------------------------------------------------|
| `powerauth.service.callbacks.defaultMaxAttempts` | `1` | Default maximum number of dispatch attempts for a callback event. |
| `powerauth.service.callbacks.defaultRetentionPeriod` | `30d` | Default retention period of a completed callback event before deleting its record from the database table. |
| `powerauth.service.callbacks.defaultInitialBackoff` | `2s` | Default initial backoff after an unsuccessful attempt to dispatch a callback event. |
| `powerauth.service.callbacks.maxBackoff` | `32s` | The maximum allowable backoff period between successive attempts to dispatch a callback event. |
| `powerauth.service.callbacks.backoffMultiplier` | `1.5` | The multiplier used to calculate the backoff period. |
| `powerauth.service.callbacks.pendingCallbackUrlEventsDispatchLimit` | `100` | Maximum number of pending callback events that will be dispatched in a single scheduled job run. |
| `powerauth.service.callbacks.threadPoolCoreSize` | `1` | Number of core threads in the thread pool used by the executor. |
| `powerauth.service.callbacks.threadPoolMaxSize` | `2` | Maximum number of threads in the thread pool used by the executor. |
| `powerauth.service.callbacks.threadPoolQueueCapacity` | `1000` | Queue capacity of the thread pool used by the executor. |
| `powerauth.service.callbacks.forceRerunPeriod` | | Time period after which a currently processed callback event is considered stale and should be scheduled to rerun. |
| `powerauth.service.callbacks.failureThreshold` | `200` | The number of consecutive failures allowed for callback events with the same configuration. |
| `powerauth.service.callbacks.resetTimeout` | `60s` | Time period after which a Callback URL Event will be dispatched, even if failure threshold has been reached. |

The backoff period after the `N-th` attempt is calculated as follows:

```
exponentialBackoff = initialBackoff * backoffMultiplier^(N-1)
backoffPeriod = min(exponentialBackoff, maxBackoff)
```
26 changes: 26 additions & 0 deletions docs/Database-Structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,11 @@ Stores callback URLs - per-application endpoints that are notified whenever an a
| attributes | TEXT | - | Callback attributes as a key-value map, serialized into JSON. |
| authentication | TEXT | - | Callback HTTP request authentication configuration, serialized into JSON. |
| encryption_mode | VARCHAR(255) | DEFAULT 'NO_ENCRYPTION' NOT NULL | Encryption of authentication values: `NO_ENCRYPTION` means plaintext, `AES_HMAC` for AES encryption with HMAC-based index. |
| max_attempts | INTEGER | - | Maximum number of attempts to dispatch a callback. |
| initial_backoff | VARCHAR(64) | - | Initial backoff period before the next send attempt, stored as a ISO 8601 string. |
| retention_period | VARCHAR(64) | - | Minimal duration for which is a completed callback event persisted, stored as a ISO 8601 string. |
| timestamp_last_failure | DATETIME | - | The timestamp of the most recent failed callback event associated with this configuration. |
| failure_count | INTEGER | - | The number of consecutive failed callback events associated with this configuration. |
<!-- end -->

<!-- begin database table pa_token -->
Expand Down Expand Up @@ -369,4 +374,25 @@ Table stores details about temporary key pairs used for data encryption.
| private_key_base64 | varchar(255) | - | Temporary private key encoded as Base64. |
| public_key_base64 | varchar(255) | - | Temporary public key encoded as Base64. |
| timestamp_expires | timestamp | index | Timestamp of when the temporary key pair expires. |

<!-- begin database table pa_application_callback_event -->
### Callback URL Events

Table stores Callback URL Events to monitor processing of the callbacks.

#### Columns

| Name | Type | Info | Note |
|-------------------------|-------------|-------------------------------------------|------------------------------------------------------------------------------------|
| id | bigint | primary key | Identifier of the Callback URL Event. |
| application_callback_id | varchar(37) | foreign key: pa\_application\_callback.id | Reference to configuration of the Callback URL Event. |
| callback_data | text | - | Data payload of the Callback URL Event. |
| status | varchar(32) | - | Current status of the Callback URL Event. |
| timestamp_created | timestamp | - | Timestamp of the Callback URL Event creation. |
| timestamp_last_call | timestamp | - | Timestamp of the last attempt to send the Callback URL Event. |
| timestamp_next_call | timestamp | - | Timestamp of the next scheduled time to send the Callback URL Event. |
| timestamp_delete_after | timestamp | - | Timestamp after which the Callback URL Event record can be deleted from the table. |
| timestamp_rerun_after | timestamp | - | Timestamp after which the Callback URL Event in processing state will be rerun. |
| attempts | integer | - | Number of dispatch attempts made for the Callback URL Event. |
| idempotency_key | varchar(36) | - | Idempotency key associated with the Callback URL Event. |
<!-- end -->
50 changes: 50 additions & 0 deletions docs/PowerAuth-Server-1.9.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,44 @@ To facilitate a new feature of temporary keys, we added a new `pa_temporary_key`
* A new column `encryption_mode` has been added to the `pa_application_config` table to enable encryption of configuration values.
* A new column `encryption_mode` has been added to the `pa_application_callback` table to enable encryption of authentication values.

### New Database Table for Callback Events Monitoring

A new `pa_application_callback_event` table has been created to monitor Callback URL Events. This change introduces
the additional benefit of setting a retry strategy for individual Callback URL Events and monitoring the state of each
dispatched event. The table contains following columns:
- `id` - Event identifier, generated using sequence `pa_app_callback_event_seq`.
- `application_callback_id` - Reference for corresponding Callback URL record in the `pa_application_callback` table.
- `callback_data` - Data payload of the Callback URL Event.
- `status` - Current state of the Callback URL Event.
- `timestamp_created` - Creation timestamp of the Callback URL Event.
- `timestamp_last_call` - Timestamp of the last attempt to send the Callback URL Event.
- `timestamp_next_call` - Timestamp of the next scheduled time to send the Callback URL Event.
- `timestamp_delete_after` - Timestamp after which the Callback URL Event record should be deleted from the table.
- `timestamp_rerun_after` - Timestamp after which the Callback URL Event record in processing state should be rerun.
- `attempts` - Number of dispatch attempts made for the Callback URL Event.
- `idempotency_key` - UUID used as the `Idempotency-Key`.

The `pa_application_callback_event` table comes with following indices:
- `pa_app_cb_event_status_idx` on `(status)`,
- `pa_app_cb_event_ts_del_idx` on `(timestamp_delete_after)`.

### Add Columns to Configure Callback Retry Strategy

New columns has been added to the `pa_application_callback` table. These columns provide additional configuration
options for the retry strategy with an exponential backoff algorithm. Namely:
- `max_attempts` to set the maximum number of attempts to dispatch a callback,
- `initial_backoff` to set the initial backoff period before the next send attempt, and
- `retention_period` to set the duration for which is the callback event stored.

These settings at the individual callback level overrides the global default settings at the application level.

### Add Columns to Enable Callback Failures Monitoring

Following columns has been added to the `pa_application_callback` table to enable monitoring of callback dispatch
failures:
- `failure_count` to hold the number of consecutive failed callbacks of the same configuration, and
- `timestamp_last_failure` to store the timestamp of the most recent failed callback attempt.


## REST API Changes

Expand All @@ -44,3 +82,15 @@ Use the `commitPhase` parameter for specifying when the activation should be com

The method `POST /rest/v3/signature/ecdsa/verify` now supports validation of ECDSA signature in JOSE format, thanks to added `signatureFormat` request attribute (`DER` as a default value, or `JOSE`).

## Other Changes

### New Configuration Properties for Callback Events Monitoring

New configuration options has been added to modify the Callback URL Events monitoring and retry policy.
See the [Callback URL Events Configuration section](./Configuration-Properties.md#callback-url-events-configuration)
for further details.

### Idempotency-Key of Callback URL Events

Callback URL Events now include an `Idempotency-Key` in the HTTP request header. It is a unique UUIDv4 key to recognize
retries of the same request.
Loading

0 comments on commit f2f3484

Please sign in to comment.