Update README.md

advaithsrao · Dec 28, 2023 · acc79cb · acc79cb
1 parent c6479f8
commit acc79cb
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -111,17 +111,17 @@ These are the training label splits before annotation
 
 ***
 
-## DATA ANNOTATION
+## Data Annotation
 
-### 1. AUTOMATED ML LABELING
+### 1. Automated ML Labeling
 
 The following heuristics are used to annotate labels for enron email data using the other two data sources:
 1. Phishing Model Annotation:  We are annotating mails from the Enron dataset using a high-precision model trained on the Phishing mails dataset. 
 2. Social Engineering Model Annotation: We are annotating mails from the Enron dataset using a high-precision model trained on the Social Engineering mails dataset. 
 
 The two ML Annotator models use Term Frequency Inverse Document Frequency (TFIDF) to embed the input text and make use of SVM models with Gaussian Kernel.
 
-### 2. EMAIL SIGNALS
+### 2. Email Signals
 
 Email Signal based heuristics are used to specifically filter and target suspicious emails for fraud labeling. The signals used are:
 Person Of Interest: There is a publicly available list of email addresses of employees who were liable for the massive data leak at Enron. These user mailboxes can have a higher chance of containing quality fraud emails.
@@ -143,7 +143,7 @@ The below table represents the distribution of the length of email bodies in ter
 | 75% | 4 |
 | max | 5486 |
 
-### 3. MANUAL INSPECTION
+### 3. Manual Inspection
 
 To ensure high-quality labels, we manually inspect the mismatch examples from ML Annotation to relabel the enron dataset.