π§ [email protected] | πΌ LinkedIn | π GitHub
Hello! Iβm a Data Science Professional with a solid foundation of 2 years as a Business Analyst, where Iβve specialized in delivering impactful data insights to senior leadership and guiding data-driven decisions across dynamic industries. With expertise in SQL, Python, and Tableau, and an MS in Data Science, I excel in data preprocessing, visualization, and applying machine learning algorithms to turn complex datasets into actionable insights. Iβm also skilled at managing cross-functional projects and effectively communicating data insights to non-technical stakeholders.
Actively looking for a full-time role as a Data Scientist / Data Analyst
Python, data ETL, Transformer, Streamlit github
- Led a team of 3 graduate students for a World Bank project, contributing to the digital transformation efforts for unbanked women in Nigeria.
- Developed a web application to automate the extraction and transformation of financial data into CSV format, enhancing data accessibility
- Boosted data extraction efficiency by 600% using advanced machine learning models to process over 200 unstructured documents.
- Designed and implemented an interactive visualization dashboard with Streamlit, showcasing financial insights through exploratory data analysis (EDA) and statistical testing.
- Presented findings to over 100 World Bank employees, receiving positive feedback from a non-technical audience.
Python, LLM, Transformer, Streamlit github
- Developed a web app using Streamlit for Artist-specific Lyric Generation & Lyric Analysis.
- Fine-tuned GPT-2 model for lyrics generation, and trained on the lyrics from top artists.
- Built a lyrics analyzer (Lyric Summarization, Keyword Extraction,...) using transformer models including BERT and BART.
Python, Binary Classification, Statistical Modeling github
- Applied machine learning algorithms to the CDC's Behavioral Risk Factor Surveillance System (BRFSS) dataset to predict heart disease likelihood.
- Focused on data cleaning, feature selection, and identifying key health indicators impacting heart disease risk.
- Handled imbalanced binary classification data with the SMOTE oversampling method.
- Developed statistical prediction models including MLP, Logistic Regression, XGBoost, and Random Forest with the best F1 macro score of 0.8.
Python, Time Series, SARIMA, LSTM github
- Developed a time-series model to predict traffic volume on I-94 in Minneapolis-St Paul using multivariate data, including weather and holidays.
- Handled the missing value in hourly traffic volume data with various methods such as interpolation and data resampling.
- Built ARIMA, SARIMA, OLS model, and LSTM model to predict the traffic volume.
- Evaluated models based on RMSE, analyzed seasonality, and provided insights to enhance urban planning and reduce congestion.
Tableau link
- Developed interactive Tableau dashboard to analyze U.S. road accidents, providing key insights on road safety by identifying critical factors from a dataset of 30,000 observations and 400 variables from NHTSA.
R, Binary Classification, Statistical Modeling github
- Conducted exploratory data analysis to identify key numerical and categorical variables influencing song popularity on R.
- Developed binary classification models including KNN, Tree-based, and Logistic Regression to classify popular songs on Spotify based on audio features with the best accuracy of 0.923 (KNN).
- Programming Languages: Python, R, SQL, C++
- Tools: PyTorch, Transformers, TensorFlow, Streamlit, Sklearn, Pandas, MySQL, Oracle SQL, MongoDB, Linux Shell Script, NLTK, spaCy
- Cloud Platforms: AWS (Cloud Web Builder, SageMaker), Google Cloud Platform (Vertex AI, Google BigQuery)
- Data Visualization: Tableau, Power BI