This repository consists of codes related to the paper " Inferring Gender from Names on the Web: A Comparative Evaluation of Gender Detection Methods" by Karimi et al., WWW Conf. 2016 http://dl.acm.org/citation.cfm?doid=2872518.2889385
Codes are written by Fariba Karimi, Florian Lemmerich and Stefan Vujovic
In order to perform gender inference according to the aproach described in the paper mentioned above the following files can be used:
This is the first step in our "pipeline", where the genderize API is used to infer gender, based on the first name of a person.
- Pandas
- Genderize Client
- Python 3.x
- Genderize API key
Before running the script a file (csv or json) with first names is needed. An output file with gender, confidence for assigning gender, and frequency of the names in the database will be generated.
In order to run the scipt two command line arguments need to be specified, path to the input file with names, and path where the output file should be saved.
name |
---|
Peter |
Fariba |
Jovan |
name | count | gender | confidence |
---|---|---|---|
Peter | 3658 | male | .99 |
Fariba | 465 | female | .96 |
Jovan | 60 | male | .98 |
python genderize_query.py [inputfile.csv] [outputfile.csv]
For the next step we need to retrieve images for specified names from Google Image results by using getGoogleImages and then use Face++ API to infere gender out of these images.
Before runing this file, a file containing first and last names should be prepared. The format of one name is: FIRST_NAME+LAST_NAME, for example: fariba+karimi. As a result of runing the script, a file will be generated and it will contain 5 URLs of images retrieved for specified name, for example: fariba+karimi,[url1,url2,url3,url4,url5]
- Java
Peter+Smith |
---|
Fariba+Karimi |
Jovan+Jovanovic |
Peter+Smith | http://.. | http://.. | http://.. | http://.. | http://.. |
---|---|---|---|---|---|
Fariba+Karimi | http://.. | http://.. | http://.. | http://.. | http://.. |
Jovan+Jovanovic | http://.. | http://.. | http://.. | http://.. | http://.. |
The script can be ran this way:
java -jar getGoogleImagesv0.31.jar [inputfile.txt] [output.csv]
Using the file generated by getGoogleImages as input, we can retrieve gender for specified images using Face++.
- Python 2
- urllib2
- Face++ API key
In order to run the scipt two command line arguments need to be specified, path to the input file with names and urls, and path where the output file (json) should be saved.
Peter+Smith | http://.. | http://.. | http://.. | http://.. | http://.. |
---|---|---|---|---|---|
Fariba+Karimi | http://.. | http://.. | http://.. | http://.. | http://.. |
Jovan+Jovanovic | http://.. | http://.. | http://.. | http://.. | http://.. |
This file contains results for each image passed to the API. The results contain information about url, image size, face attributes like: gender , age, pose, race, smiling, mouth_left, mouth_righ, nose, eye_right.
python faceplus_query.py [inputfile.csv] [outputfile.json]
After generating the json file with face++, it needs to be processed, and this can be done using the face_plus_processing.py whichh also follows the approach specified in the refferenced paper.
Json file specified above.
name | gender |
---|---|
Fariba+Karimi | female |
Jovan+Jovanovic | male |
python faceplus_processing.py [inputfile.json] [outputfile.csv]