Skip to content

All of my data science capstone projects including online course certificates offered by Coursera, EdX, Udemy and Udacity,

Notifications You must be signed in to change notification settings

aghoshpro/myProjects

Repository files navigation

Data Science Projects

Local Environment Setup

Using Docker 🐳

  • Go to location using cmd or terminal

    cd env_docker
    docker compose -f docker-compose.yml up
  • Put the desired notebook in the notebook directory along with the data in the data folder.

Using Conda 🐍

  • Open your favourite terminal or cmd to download the dependencies listed in envALL.yml

    conda env create -f envALL.yml

11. Sparkify - Detection of User Churn using PySpark

Forecasting churn or attrition rates presents a complex and prevalent challenge that data scientists and analysts frequently face in customer-oriented enterprises. The capacity to adeptly handle extensive datasets using Spark is among the most sought-after competencies in the data domain. Also to convey the findings of the project to company shareholder in a manner so they can understand.

10. Personalized Real Estate Agent

Envision yourself as a skilled developer at "Future Homes Realty," an innovative real estate firm. In an industry where personalisation is crucial for consumer satisfaction, your company aims to transform client interactions with real estate listings. To create a novel application called HomeMatch that utilises large language models (LLMs) and vector databases to convert conventional real estate listings into customised narratives that align with the distinct preferences and requirements of prospective purchasers.

9. ChatBOT using Retrieval Augmented Generation (RAG)

Custom Chatbot project so that our fashion-focused chat interface can work with it. The information in this dataset carefully shows the complex changes that happen in modern fashion. It includes famous colour schemes, fabric choices, and other important fashion insights seen in 2023. This dataset fits perfectly intending to make it easier to create a complex chatbot that can meet the specific needs of fashion fans and people who work in the industry.

8. Landmark Classification (using CNN) & Tagging for Social Media

7. Classification of Handwritten Digit using MNIST Data

The task is to classify a given image of a handwritten digit into one of 10 classes representing integer values from 0 to 9, inclusively.

The MNIST is a standard dataset used in computer vision and deep learning. The MNIST acronym stands for the Modified National Institute of Standards and Technology dataset. It has 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.

6. Disaster Response System

I analyzed disaster data to build a model for an API that classifies disaster messages by applying my data engineering skills. I have created a ML pipeline to categorize real messages that were sent during disaster events so that the messages could be sent to an appropriate disaster relief agency. The project includes a web app where an emergency worker can input a new message and get classification results in several categories. The web app will also display visualizations of the data.

5. Real World Object Detection using COCO dataset

Detecting objects with 65% confidence using pre-trained MobileNet-SSD v3 model and 183 labels or classes from COCO 2017 dataset. User can also use webcam to detect objects around their surroundings by running objectDetectionWebCam.py

4. Sip & Script

I ran an exploratory data analysis utilizing a Wine Reviews Dataset from Kaggle, which contained roughly 130k Wine Enthusiast reviews. I took this project as an opportunity to analyse the data and explain my results through a medium blog post that provides insight into the questions posed.

3. Multi-Class Dog Breed Classification

The aim is to create a classifier capable of predicting a dog's breed from a photo. In a real world scenario when someone takes a photo of a dog and wants to know what breed of dog it is using this model. The dataset contains 20000+ images of dogs of 120 breeds (12- classes)

2. Multi Class Image Classifier

Data - CIFAR10

1. Heart Disease Prediction using Regression

Question. Can the presence of heart disease in the patient be predicted based on their clinical parameters?

The dataset contains 76 attributes, but all published experiments refer to using a subset of 14 of them. It is part of Cleveland database that has been used by ML researchers to this date and originated from UCI Machine Learning Repository.


Additional Data

About

All of my data science capstone projects including online course certificates offered by Coursera, EdX, Udemy and Udacity,

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published