Initial Open Source release of Data Quality Profiler and Rules Engine

Latest

Latest

danielsmith-eu released this 26 Jul 16:22

13664fc

Provides the following:

Data Profilers for large volume data profiling in Spark
Assertion rule definitions and checking
Reference data loading and joining
Excel and CSV reference data parsing
JSON output enriched with data quality markers/profilers
Metrics and summary dataframe output
Dimensional tagging of profiler outputs (additional identifiers)
JSON flattener
JSON and CSV loader, extensible to other formats
Custom key pre-processor and custom parquet row reader functionality
Comprehensive built-in assertion rules modules, extensible
Built-in set of field-level profile masks
Compound assertion rule definition (i.e. a set of sub-rules must all pass)
Human-readable Data Quality and Assertion Rule Compliance report output

Assets 2