A small utility based on genderize. Uses the Python client to the genderize web service (https://github.com/SteelPangolin/genderize).
Create virtual environment:
$ python3 -m venv .venv
Activate virtual environment:
$ source .venv/bin/activate
Install required dependencies:
$ pip3 install -r requirements.txt
Run the analysis:
$ python -m g names.txt
important The format of the input file should be:
unique_id name
The unique_id is simply a number of identify each row that goes into the analysis
unique_id indiv_referees
1 Eva-Maria Mandelkow
2 Jurgen Gotz
3 Karen Avraham
4 Stephen High
5 Ramanujan Hegde
6 Alfred Goldberg
Warning: There is a limit of 1000 requests per day!
Output sample:
$ python -m g names.txt
name:Eva-Maria gender:female probability:1.0 count:6
name:Jurgen gender:male probability:0.97 count:36
name:Karen gender:female probability:1.0 count:5462
name:Stephen gender:male probability:1.0 count:2608
name:Ramanujan gender:None
name:Alfred gender:male probability:1.0 count:230
name:Helene gender:female probability:1.0 count:255
name:Silvio gender:male probability:1.0 count:183
name:Junmin gender:male probability:1.0 count:2
name:Mark gender:male probability:1.0 count:6176
name:Henning gender:male probability:1.0 count:77
name:Ted gender:male probability:1.0 count:376
name:Olga gender:female probability:1.0 count:898
name:Mala gender:female probability:0.97 count:33
name:Thilo gender:male probability:1.0 count:11
results written to names_genders.txt
requires libraries dplyr, readr, mefa, argparse and XML
$ Rscript gender_parse.R -i names.txt -nlines 'number of lines that should be skipped when reading in the file'
arguments:
-i input file
-nlines number of lines to skip when reading in the eJP report file
cd to the directory 'webapp/'
$ python gender_webapp.py
Go to localhost:5001
Upload an eJP track report file. Results are automatically downloaded as a txt file.