Skip to content

Commit

Permalink
docs(observe): Dataset Profile volume assertion guide (datahub-projec…
Browse files Browse the repository at this point in the history
  • Loading branch information
zmcnellis authored Aug 31, 2023
1 parent d78a790 commit ee06a65
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion docs/managed-datahub/observe/volume-assertions.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,10 @@ source types vary by the platform, but generally fall into these categories:
- **Query**: A `COUNT(*)` query is used to retrieve the latest row count for a table, with optional SQL filters applied (depending on platform).
This can be less efficient to check depending on the size of the table. This approach is more portable, as it does not involve
system warehouse tables, it is also easily portable across Data Warehouse and Data Lake providers.

- **DataHub Dataset Profile**: The DataHub Dataset Profile aspect is used to retrieve the latest row count information for a table.
Using this option avoids contacting your data platform, and instead uses the DataHub Dataset Profile metadata to evaluate Volume Assertions.
Note if you have not configured an ingestion source through DataHub, then this may be the only option available.

Volume Assertions also have an off switch: they can be started or stopped at any time with the click of button.

Expand Down Expand Up @@ -166,14 +170,15 @@ Once these are in place, you're ready to create your Volume Assertions!
</p>

7. (Optional) Click **Advanced** to customize the volume **source**. This is the mechanism that will be used to obtain the table
row count metric. Each Data Platform supports different options including Information Schema and Query.
row count metric. Each Data Platform supports different options including Information Schema, Query, and DataHub Dataset Profile.

<p align="center">
<img width="45%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/volume/assertion-builder-volume-select-source-type.png"/>
</p>

- **Information Schema**: Check the Data Platform system metadata tables to determine the table row count.
- **Query**: Issue a `COUNT(*)` query to the table to determine the row count.
- **DataHub Dataset Profile**: Use the DataHub Dataset Profile metadata to determine the row count.

8. Click **Next**
9. Configure actions that should be taken when the Volume Assertion passes or fails
Expand Down

0 comments on commit ee06a65

Please sign in to comment.