HW 7: logistic regression

In this assignment, you'll implement a classifier using logistic regression, optimized with gradient descent.

Overview

In class, we went over an implementation of linear regression using gradient descent. For this homework, you will be implementing a logistic regression model using the same framework. Logistic regression is useful for binary classification because the sigmoid function outputs a value between 0 and 1.

In this repository, you are given a set of simulated medical record data from patients with small cell and non-small cell lung cancers. Your goal to apply a logistic regression classifier to this dataset, predicting whether a patient has small cell or non-small cell lung cancer based on features of their medical record prior to diagnosis.

Logistic regression

As stated above, logistic regression involves using a sigmoid function to model the data. Just like in linear regression, we will define a loss function to keep track of how well the model performs. But instead of mean-squared error, you will implement the binary cross entropy loss function. This function minimizes the error when the predicted y is close to an expected value of 1 or 0. Here are some resources to get you started: [1], [2], [3].

Dataset

You will find the full dataset in data/nsclc.csv. Class labels are encoded in the NSCLC column of the dataset, with 1 = NSCLC and 0 = small cell. A set of features has been pre-selected for you to use in your model during testing (see main.py), but you are encouraged to submit unit tests that look at different features. The full list of features can be found in logreg/utils.py.

Tasks + Grading

[TODO] Complete the logistic regression implementation. (5 points)
- complete the make_prediction method
- complete the loss_function method
- complete the calculate_gradient method
- readable code with clear comments and method descriptions
[TODO] Write appropriate unit tests for each implemented function and for overall training procedure. See test/test_logreg.py for some suggested tests. (3 points)
[TODO] Package as a module using pyproject.toml and set up GitHub Actions to install your module and run your unit tests. Add a status badge to this README. (2 points)

Getting started

Fork this repository to your own GitHub account. Work on the codebase locally and commit changes to your forked repository.

You will need following packages:

Additional notes

Try tuning the hyperparameters if you find that your model doesn't converge. Too high of a learning rate or too large of a batch size can sometimes cause the model to be unstable (e.g. loss function goes to infinity). If you're interested, scikit-learn also has some built-in toy datasets that you can use for testing.

We're applying a pretty simple model to a relatively complex problem here, so you should expect your classifier to perform decently but not amazingly. It's also possible for a given optimization run to get stuck in a local minima depending on the initialization. With that said, if your implementation is correct and you found reasonable hyperparameters, you should almost always at least do better than chance.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
regression		regression
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HW 7: logistic regression

Overview

Logistic regression

Dataset

Tasks + Grading

Getting started

Additional notes

About

Releases

Packages

Languages

License

bmi203-2023/hw7-regression

Folders and files

Latest commit

History

Repository files navigation

HW 7: logistic regression

Overview

Logistic regression

Dataset

Tasks + Grading

Getting started

Additional notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages