Alternative names to specify language #1

sebastian-meckovski · 2024-10-05T13:05:50Z

Hi.

The script returns a reliable dataset and would be useful for my project.

Would it be possible to reformat the dataset to specify the alternative name in which language it is provided? So instead of this:

FR,France,,,Paris,"Baariis,Bahliz,Baris,Lungsod ng Paris,Lutece ..... "

To return something like

FR,France,,,Paris,"af: Baariis, za: Bahliz, tr: Baris, ..... "

Because without this I don't know how to use these alternative names.

The text was updated successfully, but these errors were encountered:

joelacus · 2024-11-30T13:43:24Z

Hi! Sorry for the late response. This is a good point. The current source just lists the alternative names. I'll see if I can find a source for the alternative place names with the language they belong to and update the script.

sebastian-meckovski · 2024-12-02T10:58:33Z

I have written a script that may be solving this.
https://github.com/sebastian-meckovski/geo-data-generator/blob/master/countries_data.py

It does many things, for example it drops all administrative area names of each location unless there are two or more cities with the same in in the same country.

But most importantly it creates join between these two datasets:

global_cities_url = 'http://download.geonames.org/export/dump/allCountries.zip'
alternate_names_url = 'http://download.geonames.org/export/dump/alternateNamesV2.zip'

and then by specifying the languages in comma separated string it will get all the languages needed. Feel free to use this as example

joelacus · 2024-12-02T22:14:51Z

Ah nice, that's cool. I'm glad you figured something out.

I found the alternateNamesV2 dataset as well and have managed to implement it into the script, but now I'm trying to optimise it somehow as 18 million lines is a lot to process...

sebastian-meckovski · 2024-12-05T22:37:18Z

Yeah the alternate names dataset is huge and loading this into RAM alone takes around a minute or two. There's another problem - many places have few alternate names so the script also needs to take that into account. I only needed one alternate place per language so my script selects only one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative names to specify language #1

Alternative names to specify language #1

sebastian-meckovski commented Oct 5, 2024 •

edited

Loading

joelacus commented Nov 30, 2024

sebastian-meckovski commented Dec 2, 2024

joelacus commented Dec 2, 2024

sebastian-meckovski commented Dec 5, 2024

Alternative names to specify language #1

Alternative names to specify language #1

Comments

sebastian-meckovski commented Oct 5, 2024 • edited Loading

joelacus commented Nov 30, 2024

sebastian-meckovski commented Dec 2, 2024

joelacus commented Dec 2, 2024

sebastian-meckovski commented Dec 5, 2024

sebastian-meckovski commented Oct 5, 2024 •

edited

Loading