From dee1bc854cb93b2c9daf8a70d40ae05239560e72 Mon Sep 17 00:00:00 2001 From: Zachary McNellis Date: Wed, 30 Aug 2023 11:08:06 -0700 Subject: [PATCH] docs(observe): DataHub Operation freshness assertion guide (#8749) Co-authored-by: John Joyce --- .../observe/freshness-assertions.md | 41 +++++++++++++++---- 1 file changed, 34 insertions(+), 7 deletions(-) diff --git a/docs/managed-datahub/observe/freshness-assertions.md b/docs/managed-datahub/observe/freshness-assertions.md index d10d1f18eba5c9..c5d4ca9081b43d 100644 --- a/docs/managed-datahub/observe/freshness-assertions.md +++ b/docs/managed-datahub/observe/freshness-assertions.md @@ -122,8 +122,12 @@ Change Source types vary by the platform, but generally fall into these categori is higher than the previously observed value, in order to determine whether the Table has been changed within a given period of time. Note that this approach is only supported if the Change Window does not use a fixed interval. - Using the final 2 approaches - column value queries - to determine whether a Table has changed useful because it can be customized to determine whether - specific types of important changes have been made to a given Table. + - **DataHub Operation**: A DataHub "Operation" aspect contains timeseries information used to describe changes made to an entity. Using this + option avoids contacting your data platform, and instead uses the DataHub Operation metadata to evaluate Freshness Assertions. + This relies on Operations being reported to DataHub, either via ingestion or via use of the DataHub APIs (see [Report Operation via API](#reporting-operations-via-api)). + Note if you have not configured an ingestion source through DataHub, then this may be the only option available. + + Using either of the column value approaches (**Last Modified Column** or **High Watermark Column**) to determine whether a Table has changed can be useful because it can be customized to determine whether specific types of important changes have been made to a given Table. Because it does not involve system warehouse tables, it is also easily portable across Data Warehouse and Data Lake providers. Freshness Assertions also have an off switch: they can be started or stopped at any time with the click of button. @@ -178,7 +182,7 @@ _Check whether the table has changed in a specific window of time_ 7. (Optional) Click **Advanced** to customize the evaluation **source**. This is the mechanism that will be used to evaluate -the check. Each Data Platform supports different options including Audit Log, Information Schema, Last Modified Column, and High Watermark Column. +the check. Each Data Platform supports different options including Audit Log, Information Schema, Last Modified Column, High Watermark Column, and DataHub Operation.

@@ -189,11 +193,12 @@ the check. Each Data Platform supports different options including Audit Log, In - **Last Modified Column**: Check for the presence of rows using a "Last Modified Time" column, which should reflect the time at which a given row was last changed in the table, to determine whether the table changed within the evaluation period. - **High Watermark Column**: Monitor changes to a continuously-increasing "high watermark" column value to determine whether a table - has been changed. This option is particularly useful for tables that grow consistently with time, for example fact or event (e.g. click-strea) tables. It is not available + has been changed. This option is particularly useful for tables that grow consistently with time, for example fact or event (e.g. click-stream) tables. It is not available when using a fixed lookback period. +- **DataHub Operation**: Use DataHub Operations to determine whether the table changed within the evaluation period. -8. Click **Next** -9. Configure actions that should be taken when the Freshness Assertion passes or fails +1. Click **Next** +2. Configure actions that should be taken when the Freshness Assertion passes or fails

@@ -280,7 +285,7 @@ Note that to create or delete Assertions and Monitors for a specific entity on D In order to create a Freshness Assertion that is being monitored on a specific **Evaluation Schedule**, you'll need to use 2 GraphQL mutation queries to create a Freshness Assertion entity and create an Assertion Monitor entity responsible for evaluating it. -Start by creating the Freshness Assertion entity using the `createFreshnessAssertion` query and hang on to the 'urn' field of the Assertion entit y +Start by creating the Freshness Assertion entity using the `createFreshnessAssertion` query and hang on to the 'urn' field of the Assertion entity you get back. Then continue by creating a Monitor entity using the `createAssertionMonitor`. ##### Examples @@ -337,6 +342,28 @@ After creating the monitor, the new assertion will start to be evaluated every 8 You can delete assertions along with their monitors using GraphQL mutations: `deleteAssertion` and `deleteMonitor`. +### Reporting Operations via API + +DataHub Operations can be used to capture changes made to entities. This is useful for cases where the underlying data platform does not provide a mechanism +to capture changes, or where the data platform's mechanism is not reliable. In order to report an operation, you can use the `reportOperation` GraphQL mutation. + + +##### Examples +```json +mutation reportOperation { + reportOperation( + input: { + urn: "", + operationType: INSERT, + sourceType: DATA_PLATFORM, + timestampMillis: 1693252366489 + } + ) +} +``` + +Use the `timestampMillis` field to specify the time at which the operation occurred. If no value is provided, the current time will be used. + ### Tips :::info