Skip to content

Latest commit

 

History

History
56 lines (45 loc) · 4.77 KB

README.md

File metadata and controls

56 lines (45 loc) · 4.77 KB

Detecting Hateful Speech in Social Media Comments

In this project, we apply machine learning to unstructured data to detect hate speech in comments from the Civil Comments dataset, with labeling informed by the Online Hate Index Research Project at D-Lab, University of California, Berkeley.

Goal

Our goal is to classify comments as hateful or not hateful. Historically, attempts to do similar classifications misidentify comments that mention identify groups that could be attacked with hate speech as hateful. We hope to develop more nuanced models that correctly categorize both hateful speech and non-hateful identity references.

Team Members

Technologies

Python:

Amazon Web Services:

Google Cloud Services:

Files & Notebooks

Final Models

Feature Generation

  • feature_generation_functions.py: Contains modules and functions used to generate text and numerical features for model. (273 lines)
  • feature_generation.ipynb: Python 3 notebook used to run functions from feature_generation_functions.py and pickle_functions.py. Generates features, pickles data frames, and sends to s3 bucket. (160 lines)

Helper Functions

  • model_functions.py: Contains modules and functions to generate and test Naive Bayes and SVM models; run metrics on models. (226 lines)
  • pickle_functions.py: Contains modules and functions used to read/write data from/to pickle files hosted in AWS s3 bucket. (60 lines)
  • exploration/exploration_functions.py: Contains modules and functions used to explore dataset. (103 lines)

Intermediate Models

If there are any issues opening a notebook, please enter the link into the renderer at the following site: https://nbviewer.jupyter.org/