Skip to content

Commit

Permalink
fix(pipeline) : Make the soliguide addresses a bit more correct
Browse files Browse the repository at this point in the history
They always contain the postcode and city name, which unfortunately
makes the BAN less reliable :'(

Example:

    curl https://api-adresse.data.gouv.fr/search/?q=311 Av lou Gabian, 83600 Fréjus 83600 Fréjus
    result_score=0.54

    curl https://api-adresse.data.gouv.fr/search/?q=311 Av lou Gabian, 83600 Fréjus
    result_score=0.73

A y regarder de plus près dans les sources, Soliguide a des addresses
contenant toujours code postal et commune. Une regexp pour y remédier !

   position__address                                                         after_regexp
   11 Rue Louis Apffel, 67000 Strasbourg                                   |  11 Rue Louis Apffel
    9 Rue Déserte, 67000 Strasbourg                                        |   9 Rue Déserte
    107 Ave Parmentier, 75011 Paris                                        | 107 Ave Parmentier
    2 Rue Bartisch, 67100 Strasbourg                                       |   2 Rue Bartisch
    1 Rue du Rempart, 67000 Strasbourg                                     |   1 Rue du Rempart
    Hôpital Civil de Strasbourg, 1 Pl. de l'Hôpital, 67000 Strasbourg      | Hôpital Civil de Strasbourg, 1 Pl. de l'Hôpital
  • Loading branch information
vperron committed Aug 7, 2024
1 parent 635f016 commit 1b4b98a
Showing 1 changed file with 10 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,18 @@ WITH lieux AS (

final AS (
SELECT
lieu_id AS "id",
_di_source_id AS "source",
position__coordinates__x AS "longitude",
position__coordinates__y AS "latitude",
position__additional_information AS "complement_adresse",
position__city AS "commune",
position__address AS "adresse",
position__postal_code AS "code_postal",
lieu_id AS "id",
_di_source_id AS "source",
position__coordinates__x AS "longitude",
position__coordinates__y AS "latitude",
position__additional_information AS "complement_adresse",
position__city AS "commune",
REGEXP_REPLACE(position__address, ', \d\d\d\d\d.*$', '') AS "adresse",
position__postal_code AS "code_postal",
-- TODO: use position__city_code
-- currently the field contains a majority of postal codes...
NULL AS "code_insee"
-- update(2024-08-07) : this is still the case.
NULL AS "code_insee"
FROM lieux
ORDER BY 1
)
Expand Down

0 comments on commit 1b4b98a

Please sign in to comment.