I find great Insurance sample data from Kaggle which is about "Prudential Life Insurance Assessment - Can you make buying life insurance easier?". This sample data is great for practice data analysis. As usual, I use Jupyter Notebook & Azure Databricks notebook to perform analysis.
Data Fields Description
- Id, A unique identifier associated with an application.
- Product_Info_1-7, A set of normalized variables relating to the product applied for
- Ins_Age, Normalized age of applicant
- Ht, Normalized height of applicant
- Wt, Normalized weight of applicant
- BMI, Normalized BMI of applicant
- Employment_Info_1-6, A set of normalized variables relating to the employment history of the applicant.
- InsuredInfo_1-6, A set of normalized variables providing information about the applicant.
- Insurance_History_1-9, A set of normalized variables relating to the insurance history of the applicant.
- Family_Hist_1-5, A set of normalized variables relating to the family history of the applicant.
- Medical_History_1-41, A set of normalized variables relating to the medical history of the applicant.
- Medical_Keyword_1-48, A set of dummy variables relating to the presence of/absence of a medical keyword being associated with the application.
- Response, This is the target variable, an ordinal variable relating to the final decision associated with an application
File Content Description
- data/prudential_life_insurance_sample_data.csv <-- Sample data from Kaggle
- eda_for_prudential_life_insurance_sample_data.ipynb <-- Notebook sample of EDA
- eda_for_prudential_life_insurance_sample_data_databricks.ipynb <-- Notebook sample for Databricks
- eda_for_prudential_life_insurance_sample_data_databricks.html <-- Notebook HTML export from Databricks