Skip to content

Examples of Python scripts used in a car insurance study project. The presentation of findings on Tableau.

Notifications You must be signed in to change notification settings

Senja-P/Python-Customer-Insurance

Repository files navigation

Python-Customer-Insurance

Project Overview

  • The purpose of this study project was to analyse the variables impacting monthly car insurance payments and use the findings to plan a customer retention program.

  • I carried out an exploratory analysis with Python to identify the relationships between numeric variables. The results showed that the strongest correlation was between monthly payments and total claim amounts. This finding led me to form a hypothesis. I conducted a linear regression analysis to test my hypothesis. As the result indicated that the claims only explained approximately 40% of monthly payment, I conducted the cluster analysis to investigate whether data could provide any uncover new patterns to better understand the customers’ behaviours. My cluster analysis generated three distinct groups. The customers were grouped into high, medium and low clusters based on their claim amounts and the average monthly payments.

  • It is known that the cost of car insurance is based on risks. No recommendations were made due to data limitations; some key risk factors and claim data were missing and the data didn't provide a good representation of the entire year.

  • The presentation of findings and the summary of the data analysis process is available on Tableau.

Analytics

  • Conducted an exploratory visual analysis to identify relationships between variables
  • Explored geographic variables using choropleth maps to draw early insights
  • Learned supervised and unsupervised machine learning
  • Conducted regression and cluster analysis and interpreted the results

Technical skills utilised

  • The above Python scripts were created for this study project. I used the following tools and libraries; Jupyter, pandas, NumPy, os, matplotlib, seaborn, folium, json, pylab, sklearn (e.g Linear Regression, cluster, KMeans)
  • Tableau used for visualisation (dashboard also includes the categorical data)

Data

  • Almost 10K rows, 24 columns
  • Car insurance and customer data

Data source

  • The dataset used in this analysis is an open-source dataset, published on Kaggle in 2018
  • The link to the original data source, IBM Watson marketing and customer value data, is available in Kaggle page but the original dataset is no longer available. Hence, the data collection methods are unknown
  • The dataset was used for learning purposes only (in the certified Data Immersion course at CareerFoundry in 2021)

Return to the main portfolio page