Data: The dataset contains over 32,000 instances of individuals, with 14 features describing demographic and employment information. The target variable is whether the individual earns over $50k per year. The features are assumed to be independent, making it a good fit for Naive Bayes classification. Link: https://archive.ics.uci.edu/ml/datasets/Adult
Task 1: Data Preprocessing
Task 2: Naive Bayes Classifier Implementation
- Implementing a function to calculate the prior probability of each class (benign and malignant) in the training set.
- Implementing a function to calculate the conditional probability of each feature given to each class in the training set.
- Implementing a function to predict the class of a given instance using the Naive Bayes algorithm.
- Implementing a function to calculate the accuracy of Naive Bayes classifier on the testing set.
Task 3: Evaluation and Improvement
- Evaluating the performance of Naive Bayes classifier using accuracy, precision, recall, and F1-score.
- Experimenting with Laplace smoothing technique to improve the performance of your classifier.
- Comparing the performance of Naive Bayes classifier with other classification algorithms like logistic regression and k-nearest neighbors.