First version of countries

coding

The first development version of the package countries is now available on Github. This post looks at how the package can be used to work with country names.


Author

Affiliation

Francesco S. Bellelli

 

Published

Feb. 26, 2022

Citation

Bellelli, 2022


countries is an R package designed to quickly wrangle, merge and explore country data. This package will contain functions to easily identify and convert country names, pull country info and datasets, merge country data from different sources, and make quick world maps.

I recently released the first development version of the package, which is now available on my Github page. The package also has a website containing information on the package’s usage.

In this article, we will have a look at how countries can be used to work with country name. In particular, we will look in detail at the function country_name(), which can be used to convert country names to different naming conventions or to translate them to different languages. country_name() can identify countries even when they are provided in mixed formats or in different languages. It is robust to small misspellings and recognises many alternative country names and disused official names.

Installing and loading the package

Since the package is not yet on CRAN, the development version needs to be downloaded directly from the Github repository. This can be done with the devtools package.

# Install and load devtools
install.packages("devtools")
library(devtools)

# Install countries
devtools::install_github("fbellelli/countries", build_vignettes = TRUE)

The package can then be loaded normally

library(countries)

Dealing with country names

The function country_name() can be used to convert country names to different naming conventions or to translate them to different languages.

example <- c("United States","DR Congo", "Morocco")

# Getting 3-letters ISO code
country_name(x= example, to="ISO3")
[1] "USA" "COD" "MAR"
# Translating to spanish
country_name(x= example, to="name_es")
[1] "Estados Unidos"                 
[2] "República Democrática del Congo"
[3] "Marruecos"                      

If multiple arguments are passed to the argument to, the function will output a data.frame object, with one column corresponding to every naming convention.

# Requesting translation to French and 2-letter and 3-letter ISO codes
country_name(x= example, to=c("name_fr","ISO2","ISO3"))
                           name_fr ISO2 ISO3
1                       États-Unis   US  USA
2 République démocratique du Congo   CD  COD
3                            Maroc   MA  MAR

The to argument supports all the following naming conventions:

CODE DESCRIPTION
simple This is a simple english version of the name containing only ASCII characters. This nomenclature is available for all countries.
ISO3 3-letter country codes as defined in ISO standard 3166-1 alpha-3. This nomenclature is available only for the territories in the standard (currently 249 territories).
ISO2 2-letter country codes as defined in ISO standard 3166-1 alpha-2. This nomenclature is available only for the territories in the standard (currently 249 territories).
ISO_code Numeric country codes as defined in ISO standard 3166-1 numeric. This country code is the same as the UN’s country number (M49 standard). This nomenclature is available for the territories in the ISO standard (currently 249 countries).
UN_xx Official UN name in 6 official UN languages. Arabic (UN_ar), Chinese (UN_zh), English (UN_en), French (UN_fr), Spanish (UN_es), Russian (UN_ru). This nomenclature is only available for countries in the M49 standard (currently 249 territories).
WTO_xx Official WTO name in 3 official WTO languages: English (WTO_en), French (WTO_fr), Spanish (WTO_es). This nomenclature is only available for WTO members and observers (currently 189 entities).
name_xx Translation of ISO country names in 28 different languages: Arabic (name_ar), Bulgarian (name_bg), Czech (name_cs), Danish (name_da), German (name_de), Greek (name_el), English (name_en), Spanish (name_es), Estonian (name_et), Basque (name_eu), Finnish (name_fi), French (name_fr), Hungarian (name_hu), Italian (name_it), Japponease (name_ja), Korean (name_ko), Lithuanian (name_lt), Dutch (name_nl), Norwegian (name_no), Polish (name_po), Portuguese (name_pt), Romenian (name_ro), Russian (name_ru), Slovak (name_sk), Swedish (name_sv), Thai (name_th), Ukranian (name_uk), Chinese simplified (name_zh), Chinese traditional (name_zh-tw)
GTAP GTAP country and region codes.
all Converts to all the nomenclatures and languages in this table

Further options and warning messages

country_name() can identify countries even when they are provided in mixed formats or in different languages. It is robust to small misspellings and recognises many alternative country names and old nomenclatures.

fuzzy_example <- c("US","C@ète d^Ivoire","Zaire","FYROM","Estados Unidos","ITA")

country_name(x= fuzzy_example, to=c("UN_en"))
Multiple country IDs have been matched to the same country name
Set - verbose - to TRUE for more details
[1] "United States of America"        
[2] "Côte d’Ivoire"                   
[3] "Democratic Republic of the Congo"
[4] "North Macedonia"                 
[5] "United States of America"        
[6] "Italy"                           

More information on the country matching process can be obtained by setting verbose=TRUE. The function will print information on:

country_name(x= fuzzy_example, to=c("UN_en"), verbose=TRUE)

In total 6 unique country identifiers have been found
5/6 have been matched with EXACT matching
1/6 have been matched with FUZZY matching

Fuzzy matching DISTANCE summary:
 | Average:  3
 | Min: 3 
 | Q1: 3 
 | Median: 3 
 | Q3: 3 
 | Max: 3

Multiple arguments have been matched to the same country name:
  - Estados Unidos : United States of America 
  - US : United States of America
[1] "United States of America"        
[2] "Côte d’Ivoire"                   
[3] "Democratic Republic of the Congo"
[4] "North Macedonia"                 
[5] "United States of America"        
[6] "Italy"                           

In addition, setting verbose=TRUE will also print additional informations relating to specific warnings that are normally given by the function:

country_name(x= c("Taiwan","lsajdèd"), to=c("UN_en"), verbose=FALSE)
Some country IDs have no match in one or more country naming conventions
There is low confidence on the matching of some country names
Set - verbose - to TRUE for more details
[1] NA                                
[2] "Lao People's Democratic Republic"

All the information from verbose mode can be accessed by setting ´simplify=FALSE´. This will return a list object containing:

Using custom conversion tables

In some cases, the user might be unhappy with the naming conversion or no valid conversion might exist for the provided territory. In these cases, it might be useful to tweak the conversion table. The package contains a utility function called ´match_table()´, which can be used to generate conversion tables for small adjustments.

example_custom <- c("Siam","Burma","H#@°)Koe2")

#suppose we are unhappy with how "H#@°)Koe2" is interpreted by the function
country_name(x = example_custom, to = "name_en")
There is low confidence on the matching of some country names
Set - verbose - to TRUE for more details
[1] "Thailand"           "Myanmar"            "Korea, Republic of"
#match_table can be used to generate a table for small adjustments
tab <- match_table(x = example_custom, to = "name_en")
There is low confidence on the matching of some country names
tab$name_en[2] <- "Hong Kong"

#which can then be used for conversion
country_name(x = example_custom, to = "name_en", custom_table = tab)
[1] "Thailand"  "Myanmar"   "Hong Kong"

Work in progress

I am still working on the package. In the near future, the following items will be added to the package:

Footnotes

    Citation

    For attribution, please cite this work as

    Bellelli (2022, Feb. 27). F.S.Bellelli: First version of countries. Retrieved from https://fbellelli.com/posts/2022-02-27-first-version-of-countries/

    BibTeX citation

    @misc{bellelli2022first,
      author = {Bellelli, Francesco S.},
      title = {F.S.Bellelli: First version of countries},
      url = {https://fbellelli.com/posts/2022-02-27-first-version-of-countries/},
      year = {2022}
    }