Micronesia regexes #354

mattkerlogue · 2024-02-15T16:16:38Z

Related to #289, I've recently been working with a table that has Micronesia (the country) listed solely as "Micronesia" not "Federated States of Micronesia" and thus countrycode returns an NA value.

I noticed in the discussion at #289 a reference to making a distinction between the subregion and the country, however on further inspecting the codelist dataset this seems to only be applied in the case of the English regex, while the French, German and Italian regexes only test for the name of subregion.

I've certainly seen datasets where the country is just referred to as Micronesia, but I've also seen it abbreviated as "FS Micronesia" or "F.S. Micronesia" which the current English regex would also miss. Moreover, country.name.de is simply a reference to the subregion "Mikronesien" rather than the full country name (e.g. "Mikronesien (Föderierten Staaten von)").

countrycode::codelist |>
  dplyr::filter(iso3c == "FSM") |>
  dplyr::select(
    country.name.en, country.name.fr, country.name.de, country.name.it,
    country.name.en.regex, country.name.fr.regex,
    country.name.de.regex, country.name.it.regex) |>
  dplyr::glimpse()

#>  Rows: 1
#>  Columns: 8
#>  $ country.name.en       <chr> "Micronesia (Federated States of)"
#>  $ country.name.fr       <chr> "Micronésie (États fédérés de)"
#>  $ country.name.de       <chr> "Mikronesien"
#>  $ country.name.it       <chr> NA
#>  $ country.name.en.regex <chr> "fed.*micronesia|micronesia.*fed"
#>  $ country.name.fr.regex <chr> "micron(é|e)sie"
#>  $ country.name.de.regex <chr> "mikronesien"
#>  $ country.name.it.regex <chr> "micronesia"

In my personal experience it's rare that I've come across lists/situations which include continents/continental subregions alongside countries, and if they do I'd ordinarily remove those from a list before trying to use countrycode() on it. So it did surprise me that "Micronesia" didn't return a country code.

Given that "Micronesia" is the only geographic term that can so closely be attributed to either a country or region my expectation would be that it would return the country code rather than return an NA.

The text was updated successfully, but these errors were encountered:

stefgehrig · 2024-03-28T14:44:45Z

This is a common issue for me as well, and I work around it by using a custom matching from "Micronesia" to "Micronesia (Federated States of)" in all my applications. If it doesn't create problems in other situations, the suggestion by @mattkerlogue would be am improvement for my own use of the package (and probably many others)

cjyetman · 2024-03-28T14:54:14Z

I would consider this a "bug" in the non-English regexes and try to fix that. I realize that solution would likely not be very satisfying to the OP, but at least the behavior would be consistent between languages.

I would also suggest using the custom_match arg to work around this.

NilsEnevoldsen · 2024-03-28T15:05:35Z

I don't have a strong opinion. Happy to defer to @cjyetman's opinion.

What similar situations do we have in English?

> countrycode::countrycode("Korea", "country.name", "iso3c")
[1] "KOR"
> countrycode::countrycode("Sudan", "country.name", "iso3c")
[1] "SDN"
> countrycode::countrycode("America", "country.name", "iso3c")
[1] NA
Warning message:
Some values were not matched unambiguously: America 
> countrycode::countrycode("Congo", "country.name", "iso3c")
[1] "COG"
> countrycode::countrycode("Macedonia", "country.name", "iso3c")
[1] "MKD"
> countrycode::countrycode("Cyprus", "country.name", "iso3c")
[1] "CYP"

None of these are exactly the same situation. Maybe "America" is a weakly similar example.

FWIW, the UNGEGN official short name is Federated States of Micronesia (the), same as the formal name.

NilsEnevoldsen · 2024-03-28T15:10:03Z

One alternate suggestion: we could put in a custom error messages for a couple of the uniquely troublesome cases. i.e. a conversion from Micronesia as a country.name to anything else would return NA but also a suggestion to use custom_match(). I know some people don't like wordy error messages, but I think they can improve accessibility.

vincentarelbundock · 2024-04-02T20:39:43Z

Sorry for the delayed response.

I don't have a super strong view, but I guess I'd lean toward being stricter.

I personally like wordy error messages, and would be very happy to include that in a future version.

For transparency though, I'm not sure I'll get to it myself soon. But I'd be happy to review and merge a Pull Request if someone wants to implement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Micronesia regexes #354

Micronesia regexes #354

mattkerlogue commented Feb 15, 2024

stefgehrig commented Mar 28, 2024 •

edited

Loading

cjyetman commented Mar 28, 2024

NilsEnevoldsen commented Mar 28, 2024

NilsEnevoldsen commented Mar 28, 2024

vincentarelbundock commented Apr 2, 2024

Micronesia regexes #354

Micronesia regexes #354

Comments

mattkerlogue commented Feb 15, 2024

stefgehrig commented Mar 28, 2024 • edited Loading

cjyetman commented Mar 28, 2024

NilsEnevoldsen commented Mar 28, 2024

NilsEnevoldsen commented Mar 28, 2024

vincentarelbundock commented Apr 2, 2024

stefgehrig commented Mar 28, 2024 •

edited

Loading