Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pipeline) : Ajout d'un premier modèle pour monitorer la qualité des données #259

Merged
merged 6 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion pipeline/dags/import_data_inclusion_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,11 @@ def import_data_inclusion_api():
" --no-owner"
" --no-privileges"
" --table api__requests"
f" --file {tmp_file.name}"
# services & structures have foreign keys towards communes
" --table api__communes"
vperron marked this conversation as resolved.
Show resolved Hide resolved
" --table api__services"
" --table api__structures"
f" --file {tmp_file.name}",
)
print(command)
subprocess.run(command, shell=True, check=True, capture_output=True)
Expand Down
79 changes: 77 additions & 2 deletions pipeline/dbt/models/_sources.yml
Original file line number Diff line number Diff line change
@@ -1,18 +1,24 @@
version: 2

sources:
- name: data_inclusion
- name: internal
schema: public
tables:
- name: extra__geocoded_results

- name: data_inclusion_extra
- name: data_inclusion
vperron marked this conversation as resolved.
Show resolved Hide resolved
schema: data_inclusion
meta:
is_provider: true
vperron marked this conversation as resolved.
Show resolved Hide resolved
tables:
- name: structures
description: Entered by the data.inclusion team.
meta:
kind: structure
- name: services
description: Entered by the data.inclusion team.
meta:
kind: service

- name: insee
schema: insee
Expand All @@ -26,15 +32,27 @@ sources:

- name: dora
schema: dora
meta:
is_provider: true
tables:
- name: structures
meta:
kind: structure
- name: services
meta:
kind: service

- name: france_travail
schema: france_travail
meta:
is_provider: true
tables:
- name: agences
meta:
kind: structure
- name: services
meta:
kind: service

- name: finess
schema: finess
Expand All @@ -54,9 +72,15 @@ sources:

- name: mes_aides
schema: mes_aides
meta:
is_provider: true
tables:
- name: garages
meta:
kind: structure
- name: aides
meta:
kind: service

- name: annuaire_du_service_public
schema: annuaire_du_service_public
Expand All @@ -65,29 +89,53 @@ sources:

- name: cd35
schema: cd35
meta:
is_provider: true
tables:
- name: organisations
meta:
kind: structure

- name: cd72
schema: cd72
meta:
is_provider: true
tables:
- name: structures
meta:
kind: structure
- name: services
meta:
kind: service

- name: emplois_de_linclusion
schema: emplois_de_linclusion
meta:
is_provider: true
tables:
- name: siaes
meta:
kind: structure
- name: organisations
meta:
kind: structure

- name: mediation_numerique
schema: mediation_numerique
meta:
is_provider: true
tables:
- name: structures
meta:
kind: structure
- name: services
meta:
kind: service

- name: odspep
schema: odspep
meta:
is_provider: true
tables:
- name: DD009_ACTIONs_DEMARCHES
- name: DD009_ADRESSE
Expand All @@ -110,27 +158,48 @@ sources:
- name: DD009_REGION_RESSOURCE_2
- name: DD009_REGION_SUGGESTION
- name: DD009_RES_PARTENARIALE
meta:
kind: service
staging_name: res_partenariales

- name: soliguide
schema: soliguide
meta:
is_provider: true
tables:
- name: lieux
meta:
kind: structure

- name: monenfant
schema: monenfant
meta:
is_provider: true
tables:
- name: creches
meta:
kind: structure

- name: agefiph
schema: agefiph
meta:
is_provider: true
tables:
- name: services
meta:
kind: service

- name: reseau_alpha
schema: reseau_alpha
meta:
is_provider: true
tables:
- name: structures
meta:
kind: structure
- name: formations
meta:
kind: service

- name: brevo
schema: brevo
Expand All @@ -144,6 +213,12 @@ sources:

- name: action_logement
schema: action_logement
meta:
is_provider: true
tables:
- name: structures
meta:
kind: structure
- name: services
meta:
kind: service
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{% set source_model = source('data_inclusion', 'extra__geocoded_results') %}
{% set source_model = source('internal', 'extra__geocoded_results') %}

{% set table_exists = adapter.get_relation(database=source_model.database, schema=source_model.schema, identifier=source_model.name) is not none %}

Expand Down
91 changes: 91 additions & 0 deletions pipeline/dbt/models/intermediate/quality/_quality_models.yml
vperron marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
version: 2

models:
- name: int_quality__stats
vmttn marked this conversation as resolved.
Show resolved Hide resolved
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- source
- stream

columns:
- name: source
data_tests:
- not_null
- dbt_utils.not_constant
- accepted_values:
values:
- action_logement
- agefiph
- cd35
- cd72
- data_inclusion
- dora
- emplois_de_linclusion
- france_travail
- mediation_numerique
- mes_aides
- monenfant
- odspep
- reseau_alpha
- soliguide

- name: stream
data_tests:
- not_null
- dbt_utils.not_constant
- accepted_values:
values:
- agences
- aides
- creches
- DD009_RES_PARTENARIALE
- formations
- garages
- lieux
- organisations
- services
- siaes
- structures

- name: count_raw
data_tests:
- dbt_utils.accepted_range:
min_value: 0
inclusive: false

- name: count_stg
data_tests:
- dbt_utils.accepted_range:
min_value: 0
inclusive: false

- name: count_int
data_tests:
- dbt_utils.accepted_range:
min_value: 0
inclusive: false

- name: count_marts
data_tests:
- dbt_utils.accepted_range:
min_value: 0
inclusive: false

- name: count_api
data_tests:
- dbt_utils.accepted_range:
min_value: 0
inclusive: false

- name: count_contacts
data_tests:
- dbt_utils.accepted_range:
min_value: 0
inclusive: true

- name: count_addresses
data_tests:
- dbt_utils.accepted_range:
min_value: 0
inclusive: true
Loading
Loading