-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TÜRKİYE
a valid spelling?
#347
Comments
Do we not support that? We definitely should. It's just the capital form of "i" in Turkish. |
Does our regex engine have an equivalent of |
Oh, that's neat. I had to google it because I had never heard of this flag. Also funny that the C# docs use exactly our problematic case as their main example. Maybe I missed something, but I don't think |
We currently rely on Lines 280 to 281 in 848e9ce
Should we consider casting all strings-to-match with turkey <- "TÜRKİYE"
countrycode::countrycode(turkey, "country.name", "country.name")
#> Warning: Some values were not matched unambiguously: TÜRKİYE
#> [1] NA
tolower(turkey)
#> [1] "türkiye"
countrycode::countrycode(tolower(turkey), "country.name", "country.name")
#> [1] "Turkey"
turkey.en.regex <- countrycode::codelist$country.name.en.regex[match("TUR", countrycode::codelist$iso3c)]
grepl(x = turkey, pattern = turkey.en.regex, perl = TRUE, ignore.case = TRUE)
#> [1] FALSE
grepl(x = tolower(turkey), pattern = turkey.en.regex, perl = TRUE, ignore.case = TRUE)
#> [1] TRUE
grepl(x = tolower(turkey), pattern = turkey.en.regex, perl = TRUE, ignore.case = FALSE)
#> [1] TRUE |
FWIW, I don't have a view and not a real good sense of whether this can cause problems (probably not?). |
It's not the most precise solution, but might capture similar issues with other extended characters, while seemingly not changing the logic at all. A more precise solution could be to add turkey.en.regex <- countrycode::codelist$country.name.en.regex[match("TUR", countrycode::codelist$iso3c)]
grepl(x = c("Turkey", "Turkiye", "Türkiye", "TÜRKİYE"), pattern = turkey.en.regex, perl = TRUE, ignore.case = TRUE)
#> [1] TRUE TRUE TRUE FALSE
turkey.en.regex
#> [1] "turkey|t(ü|u)rkiye"
turkey.en.regex_mod <- "turkey|t(ü|u)rk(i|İ)ye"
grepl(x = c("Turkey", "Turkiye", "Türkiye", "TÜRKİYE"), pattern = turkey.en.regex_mod, perl = TRUE, ignore.case = TRUE)
#> [1] TRUE TRUE TRUE TRUE side note: I don't know why we use turkey.en.regex_mod <- "turkey|t[ü|u]rk[i|İ]ye"
grepl(x = c("Turkey", "Turkiye", "Türkiye", "TÜRKİYE"), pattern = turkey.en.regex_mod, perl = TRUE, ignore.case = TRUE)
#> [1] TRUE TRUE TRUE TRUE |
Let's just go with the more precise option, then, as there are fewer unknowns. What's the benefit of |
👍🏻
I have no idea. Honestly, I'm somewhat surprised that it works as expected using |
weird. I can see it for lookaheads and such... oh well! |
The list-one.xls that we use from https://www.six-group.com/en/products-services/financial-information/data-standards.html to get the ISO4217 codes in get_iso4217.R (or better, will use once the URL is updated in an PR #348) uses
TÜRKİYE
(note theİ
). Is this a valid spelling we should support?The text was updated successfully, but these errors were encountered: