Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New code: ITU-T E.164 #229

Open
NilsEnevoldsen opened this issue May 13, 2020 · 7 comments
Open

New code: ITU-T E.164 #229

NilsEnevoldsen opened this issue May 13, 2020 · 7 comments

Comments

@NilsEnevoldsen
Copy link
Contributor

Add ITU-T E.164 codes, better known as "country calling codes". There may be a lot of edge cases here, and I don't know how much demand there is for these codes, but it's a pretty common form of country code in the real world. Feel free to 👍 or 👎 this issue to express your views.

@cjyetman
Copy link
Collaborator

I swear I looked into this but couldn't find an easily scrapeable original source. Maybe I'm mixing this up with TLD (top-level domains). There will be some countries that have the same calling code (e.g. USA = +1, Canada = +1, etc.), but I think it's useful as a destination only code (also could be useful as an origin code when/where there is only one country result).

@NilsEnevoldsen
Copy link
Contributor Author

Would have to decide how to handle the https://en.m.wikipedia.org/wiki/North_American_Numbering_Plan. A lot of small Caribbean states have a unique area code, like Dominica +1 767, but some have more, like Dominican Republic +1 809, 829, 849. So do they all get simply +1? Any way you slice it, it’s many to one, one to many, or many to many.

@vincentarelbundock
Copy link
Owner

I'm not sure about this one. The +1? seems a bit hackish, but it could be OK.

I don't have the bandwidth right now to deal with the many-to-one issue in a more principled way.

I guess what I'm saying is that I don't have strong views and would be on board with whatever you two think is best.

@cjyetman
Copy link
Collaborator

I would not suggest using the +. Sorry, I introduced that to the conversation.

I think it's probably a fairly common thing for people to have data like...

international_code phone_number
1 212 867 5309
49 157 7523 6343

and they'd like to do something like...

data %>%
  mutate(country = countrycode(international_code, 'ITU-T_E.164', 'country.name')

to get...

international_code phone_number country
1 212 867 5309 United States
49 157 7523 6343 Germany

of course, that example's never going to work because 1 matches to more than United States, but you get the point.

@vincentarelbundock vincentarelbundock changed the title Add ITU-T E.164 codes New code: ITU-T E.164 May 14, 2020
@cjyetman
Copy link
Collaborator

My take on the North American Numbering Plan is that it is a sub-international (E.164) numbering plan, so technically the international calling code (E.164) is "1" for all of the countries it encompasses. That doesn't ease the issue of E.164 being a one-to-many numbering system, but it does mean there's a technically correct way of creating a lookup for E.164 (e.g. I believe the E.164 code for Dominican Republic should be "1").

@cjyetman
Copy link
Collaborator

Here's a start...

library(tabulizer)
library(dplyr)
library(countrycode)

extract_tables('https://www.itu.int/dms_pub/itu-t/opb/sp/T-SP-E.164D-11-2011-PDF-E.pdf', 
                 pages = 3:9, output = "data.frame") %>% 
  bind_rows() %>% 
  as_tibble() %>% 
  rename(code = 1, country = 2) %>%
  filter(!is.na(code)) %>% 
  filter(!country %in% c('Reserved', 'Spare code')) %>% 
  filter(!grepl('^Reserved', country)) %>% 
  filter(!grepl('^International', country)) %>% 
  mutate(country.name = countrycode(country, 'country.name', 'country.name', warn = FALSE)) %>% 
  filter(!is.na(country.name)) %>% 
  select(e.164 = code, country.name)
#> # A tibble: 228 x 2
#>    e.164 country.name          
#>    <int> <chr>                 
#>  1     1 American Samoa        
#>  2     1 Anguilla              
#>  3     1 Antigua & Barbuda     
#>  4     1 Bahamas               
#>  5     1 Barbados              
#>  6     1 Bermuda               
#>  7     1 British Virgin Islands
#>  8     1 Canada                
#>  9     1 Cayman Islands        
#> 10     1 Dominica              
#> # … with 218 more rows

but it doesn't catch a few weird things, like "Greenland (Denmark)"...

library(tabulizer)
library(dplyr)
library(countrycode)

extract_tables('https://www.itu.int/dms_pub/itu-t/opb/sp/T-SP-E.164D-11-2011-PDF-E.pdf', 
               pages = 3:9, output = "data.frame") %>% 
  bind_rows() %>% 
  as_tibble() %>% 
  rename(code = 1, country = 2) %>%
  filter(!is.na(code)) %>% 
  filter(!country %in% c('Reserved', 'Spare code')) %>% 
  filter(!grepl('^Reserved', country)) %>% 
  filter(!grepl('^International', country)) %>% 
  mutate(country.name = countrycode(country, 'country.name', 'country.name', warn = FALSE)) %>% 
  filter(is.na(country.name))
#> # A tibble: 10 x 4
#>     code country                                              Note  country.name
#>    <int> <chr>                                                <chr> <chr>       
#>  1   246 Diego Garcia                                         ""    <NA>        
#>  2   252 Somali Democratic Republic                           ""    <NA>        
#>  3   262 French Departments and Territories in the Indian Oc… "j"   <NA>        
#>  4   299 Greenland (Denmark)                                  ""    <NA>        
#>  5   388 Group of countries, shared code                      ""    <NA>        
#>  6   870 Inmarsat SNAC                                        ""    <NA>        
#>  7   878 Universal Personal Telecommunication Service (UPT)   "e"   <NA>        
#>  8   881 Global Mobile Satellite System (GMSS), shared code   "n"   <NA>        
#>  9   888 Telecommunications for Disaster Relief (TDR)         "k"   <NA>        
#> 10   991 Trial of a proposed new international telecommunica… ""    <NA>

... and it requires tabulizer, which requires Java, which can be a pain depending on the system. Haven't tried it in a Linux Docker yet.

@cjyetman
Copy link
Collaborator

Here's a better version using the 2016 version of codes found here. Unfortunately, they only have PDF or DOCX versions... neither of which is ideal for our purposes.

library(tabulizer)
library(dplyr)
library(countrycode)
library(assertr)
library(readr)

url <- 'https://www.itu.int/dms_pub/itu-t/opb/sp/T-SP-E.164D-2016-PDF-E.pdf'
download.file(url, temp_pdf <- tempfile(fileext = '.pdf'))

extract_tables(temp_pdf, pages = 3:9, guess = FALSE,
                 area = list(c(63, 22, 792, 588)), output = 'data.frame') %>%
  bind_rows() %>%
  as_tibble() %>%
  rename(itue164 = 1, country_orig = 2, notes = 3) %>%
  filter(!is.na(itue164)) %>%
  filter(!grepl('^Reserved', country_orig)) %>%
  filter(!grepl('^Spare code$', country_orig)) %>%
  filter(!grepl('^International', country_orig)) %>%
  filter(notes != 'f') %>%  # f = "Reserved for future use" (currently one of Vatican City)
  filter(country_orig != 'Australian External Territories') %>%
  # Ascension is using country code +247and Saint Helena and Tristan da Cuhna country code +290.
  mutate(country_orig = if_else(itue164 == 247, 'Ascension', country_orig)) %>%
  mutate(country = countrycode(country_orig, 'country.name', 'country.name', warn = FALSE)) %>%
  filter(!is.na(country)) %>%
  select(country, itue164) %>%
  assert(not_na, country, itue164) %>%
  assert(is_uniq, country) %>%
  write_csv('dictionary/data_itue164.csv', na = '')

unlink(temp_pdf)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants