-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Micronesia regexes #354
Comments
This is a common issue for me as well, and I work around it by using a custom matching from "Micronesia" to "Micronesia (Federated States of)" in all my applications. If it doesn't create problems in other situations, the suggestion by @mattkerlogue would be am improvement for my own use of the package (and probably many others) |
I would consider this a "bug" in the non-English regexes and try to fix that. I realize that solution would likely not be very satisfying to the OP, but at least the behavior would be consistent between languages. I would also suggest using the |
I don't have a strong opinion. Happy to defer to @cjyetman's opinion. What similar situations do we have in English? > countrycode::countrycode("Korea", "country.name", "iso3c")
[1] "KOR"
> countrycode::countrycode("Sudan", "country.name", "iso3c")
[1] "SDN"
> countrycode::countrycode("America", "country.name", "iso3c")
[1] NA
Warning message:
Some values were not matched unambiguously: America
> countrycode::countrycode("Congo", "country.name", "iso3c")
[1] "COG"
> countrycode::countrycode("Macedonia", "country.name", "iso3c")
[1] "MKD"
> countrycode::countrycode("Cyprus", "country.name", "iso3c")
[1] "CYP" None of these are exactly the same situation. Maybe "America" is a weakly similar example. FWIW, the UNGEGN official short name is |
One alternate suggestion: we could put in a custom error messages for a couple of the uniquely troublesome cases. i.e. a conversion from |
Sorry for the delayed response. I don't have a super strong view, but I guess I'd lean toward being stricter. I personally like wordy error messages, and would be very happy to include that in a future version. For transparency though, I'm not sure I'll get to it myself soon. But I'd be happy to review and merge a Pull Request if someone wants to implement. |
Related to #289, I've recently been working with a table that has Micronesia (the country) listed solely as "Micronesia" not "Federated States of Micronesia" and thus
countrycode
returns anNA
value.I noticed in the discussion at #289 a reference to making a distinction between the subregion and the country, however on further inspecting the
codelist
dataset this seems to only be applied in the case of the English regex, while the French, German and Italian regexes only test for the name of subregion.I've certainly seen datasets where the country is just referred to as Micronesia, but I've also seen it abbreviated as "FS Micronesia" or "F.S. Micronesia" which the current English regex would also miss. Moreover,
country.name.de
is simply a reference to the subregion "Mikronesien" rather than the full country name (e.g. "Mikronesien (Föderierten Staaten von)").In my personal experience it's rare that I've come across lists/situations which include continents/continental subregions alongside countries, and if they do I'd ordinarily remove those from a list before trying to use
countrycode()
on it. So it did surprise me that "Micronesia" didn't return a country code.Given that "Micronesia" is the only geographic term that can so closely be attributed to either a country or region my expectation would be that it would return the country code rather than return an
NA
.The text was updated successfully, but these errors were encountered: