Skip to content

Commit

Permalink
fix(pipeline) : Make the soliguide addresses a bit more correct
Browse files Browse the repository at this point in the history
They always contain the postcode and city name, which unfortunately
makes the BAN less reliable :'(

Example:

    curl https://api-adresse.data.gouv.fr/search/?q=311 Av lou Gabian, 83600 Fréjus 83600 Fréjus
    result_score=0.54

    curl https://api-adresse.data.gouv.fr/search/?q=311 Av lou Gabian, 83600 Fréjus
    result_score=0.73

A y regarder de plus près dans les sources, Soliguide a des addresses
contenant toujours code postal et commune. Une regexp pour y remédier !

   position__address                                                         after_regexp
   11 Rue Louis Apffel, 67000 Strasbourg                                   |  11 Rue Louis Apffel
    9 Rue Déserte, 67000 Strasbourg                                        |   9 Rue Déserte
    107 Ave Parmentier, 75011 Paris                                        | 107 Ave Parmentier
    2 Rue Bartisch, 67100 Strasbourg                                       |   2 Rue Bartisch
    1 Rue du Rempart, 67000 Strasbourg                                     |   1 Rue du Rempart
    Hôpital Civil de Strasbourg, 1 Pl. de l'Hôpital, 67000 Strasbourg      | Hôpital Civil de Strasbourg, 1 Pl. de l'Hôpital

Introduces the first unit test:

13:36:11  19 of 19 START unit_test int_soliguide__adresses::test_address_without_postal_code_and_city  [RUN]
13:36:11  19 of 19 PASS int_soliguide__adresses::test_address_without_postal_code_and_city  [PASS in 0.17s]
  • Loading branch information
vperron committed Aug 12, 2024
1 parent 4149089 commit 580b8ef
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ models:
- unique
- not_null
- dbt_utils.not_empty_string

- name: int_soliguide__structures
data_tests:
- check_structure:
Expand Down Expand Up @@ -54,3 +53,23 @@ models:
- relationships:
to: ref('int_soliguide__adresses')
field: id

unit_tests:
- name: test_address_without_postal_code_and_city
model: int_soliguide__adresses
given:
- input: ref('stg_soliguide__lieux')
rows:
- {position__address: '22 rue Sainte-Marthe, 75010 Paris'}
- {position__address: '3 Rpe des Mobiles, 16300 Barbezieux-Saint-Hilaire'}
- {position__address: ',,, 49610 Mozé-sur-Louet'}
- {position__address: null}
- {position__address: '36 Rte de Toulon'}
expect:
rows:
- {adresse: '22 rue Sainte-Marthe'}
- {adresse: '3 Rpe des Mobiles'}
- {adresse: null}
- {adresse: null}
- {adresse: '36 Rte de Toulon'}

Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,18 @@ WITH lieux AS (

final AS (
SELECT
lieu_id AS "id",
_di_source_id AS "source",
position__coordinates__x AS "longitude",
position__coordinates__y AS "latitude",
position__additional_information AS "complement_adresse",
position__city AS "commune",
position__address AS "adresse",
position__postal_code AS "code_postal",
lieu_id AS "id",
_di_source_id AS "source",
position__coordinates__x AS "longitude",
position__coordinates__y AS "latitude",
position__additional_information AS "complement_adresse",
position__city AS "commune",
NULLIF(BTRIM(REGEXP_REPLACE(position__address, ', \d\d\d\d\d.*$', ''), ','), '') AS "adresse",
position__postal_code AS "code_postal",
-- TODO: use position__city_code
-- currently the field contains a majority of postal codes...
NULL AS "code_insee"
-- update(2024-08-07) : this is still the case.
NULL AS "code_insee"
FROM lieux
ORDER BY 1
)
Expand Down

0 comments on commit 580b8ef

Please sign in to comment.