This project focuses on analyzing employee attrition and predicting whether an employee will leave the company using various machine learning models. The dataset used contains features such as age, job role, salary, years at the company, and more. Data preprocessing steps involved handling missing values, converting data types, and encoding categorical variables. Exploratory Data Analysis (EDA) was performed using pandas, seaborn, and matplotlib to uncover insights and visualize trends, such as the distribution of age, income, job roles, and attrition rates across different categories. Several machine learning models were employed, including Linear Regression, Gradient Boosting Regressor, Decision Tree Regressor, and Random Forest Regressor. The Random Forest Regressor achieved the highest accuracy of 93%. The models were evaluated based on their accuracy scores. This analysis provided valuable insights into key factors contributing to employee attrition, aiding organizations in developing strategies to improve employee retention. The project involved loading the dataset, preprocessing the data, conducting EDA, building and evaluating machine learning models, and interpreting the results. Overall, the project successfully demonstrated the application of data analysis and machine learning techniques to understand and predict employee attrition.
This project involves analyzing epilepsy data to predict the status of patients using various machine learning models. The dataset underwent comprehensive data preprocessing, including handling missing values, converting data types, and encoding categorical variables. Exploratory Data Analysis (EDA) was performed with pandas, seaborn, and matplotlib to visualize trends and gain insights into the data. Feature engineering was applied to create new relevant features for better model performance. The machine learning models used for prediction include Logistic Regression, Decision Tree, Random Forest Regressor, Gradient Boosting Classifier, and AdaBoost. GridSearchCV was utilized to identify the best hyperparameters for these models. Additionally, pipelines were implemented to streamline the workflow and reduce the number of steps in the process. Among the models, the Decision Tree achieved the best accuracy with a score of 90%. This project demonstrates the effective application of data preprocessing, EDA, feature engineering, and machine learning techniques to predict epilepsy status, providing valuable insights and predictive capabilities in the medical domain.
This project involves predicting shopping mall sale revenue using a dataset of mall-related features. Data preprocessing steps were carried out to handle missing values, convert data types, and encode categorical variables. Exploratory Data Analysis (EDA) was conducted using pandas, seaborn, and matplotlib to visualize trends and derive insights from the data. Feature engineering was applied to create additional relevant features to improve model performance. Various machine learning models were employed to predict revenue, including Linear Regression, Decision Tree, Random Forest Regressor, and Gradient Boosting Regressor. Among these models, the Decision Tree achieved the best accuracy, albeit with an accuracy score of -0.35. This project demonstrates the use of data preprocessing, EDA, feature engineering, and machine learning techniques in predicting shopping mall sale revenue, highlighting areas for improvement in predictive modeling for more accurate revenue forecasts.