Skip to content
View jiwoosuh's full-sized avatar

Block or report jiwoosuh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jiwoosuh/README.md

Jiwoo Suh πŸ’Ž

πŸ“§ [email protected] | πŸ’Ό LinkedIn | πŸš€ GitHub

Hello! I’m a Data Science Professional with a solid foundation of 2 years as a Business Analyst, where I’ve specialized in delivering impactful data insights to senior leadership and guiding data-driven decisions across dynamic industries. With expertise in SQL, Python, and Tableau, and an MS in Data Science, I excel in data preprocessing, visualization, and applying machine learning algorithms to turn complex datasets into actionable insights. I’m also skilled at managing cross-functional projects and effectively communicating data insights to non-technical stakeholders.

Actively looking for a full-time role as a Data Scientist / Data Analyst πŸ’‘

Projects πŸ€“

Digital Transformation for World Bank

Python, data ETL, Transformer, Streamlit github

  • Led a team of 3 graduate students for a World Bank project, contributing to the digital transformation efforts for unbanked women in Nigeria.
  • Developed a web application to automate the extraction and transformation of financial data into CSV format, enhancing data accessibility
  • Boosted data extraction efficiency by 600% using advanced machine learning models to process over 200 unstructured documents.
  • Designed and implemented an interactive visualization dashboard with Streamlit, showcasing financial insights through exploratory data analysis (EDA) and statistical testing.
  • Presented findings to over 100 World Bank employees, receiving positive feedback from a non-technical audience.

Lyric Analyzer and Generation

Python, LLM, Transformer, Streamlit github

  • Developed a web app using Streamlit for Artist-specific Lyric Generation & Lyric Analysis.
  • Fine-tuned GPT-2 model for lyrics generation, and trained on the lyrics from top artists.
  • Built a lyrics analyzer (Lyric Summarization, Keyword Extraction,...) using transformer models including BERT and BART.

Heart Disease Analysis and Prediction

Python, Binary Classification, Statistical Modeling github

  • Applied machine learning algorithms to the CDC's Behavioral Risk Factor Surveillance System (BRFSS) dataset to predict heart disease likelihood.
  • Focused on data cleaning, feature selection, and identifying key health indicators impacting heart disease risk.
  • Handled imbalanced binary classification data with the SMOTE oversampling method.
  • Developed statistical prediction models including MLP, Logistic Regression, XGBoost, and Random Forest with the best F1 macro score of 0.8.

Metro Traffic Volume Analysis and Prediction

Python, Time Series, SARIMA, LSTM github

  • Developed a time-series model to predict traffic volume on I-94 in Minneapolis-St Paul using multivariate data, including weather and holidays.
  • Handled the missing value in hourly traffic volume data with various methods such as interpolation and data resampling.
  • Built ARIMA, SARIMA, OLS model, and LSTM model to predict the traffic volume.
  • Evaluated models based on RMSE, analyzed seasonality, and provided insights to enhance urban planning and reduce congestion.

Vehicle Crash Data Visualization

Tableau link

  • Developed interactive Tableau dashboard to analyze U.S. road accidents, providing key insights on road safety by identifying critical factors from a dataset of 30,000 observations and 400 variables from NHTSA.

Spotify Tracks Popularity Classification

R, Binary Classification, Statistical Modeling github

  • Conducted exploratory data analysis to identify key numerical and categorical variables influencing song popularity on R.
  • Developed binary classification models including KNN, Tree-based, and Logistic Regression to classify popular songs on Spotify based on audio features with the best accuracy of 0.923 (KNN).

Skills 😎

  • Programming Languages: Python, R, SQL, C++
  • Tools: PyTorch, Transformers, TensorFlow, Streamlit, Sklearn, Pandas, MySQL, Oracle SQL, MongoDB, Linux Shell Script, NLTK, spaCy
  • Cloud Platforms: AWS (Cloud Web Builder, SageMaker), Google Cloud Platform (Vertex AI, Google BigQuery)
  • Data Visualization: Tableau, Power BI

Pinned Loading

  1. upmanyu1993/Final_Project_Group5 upmanyu1993/Final_Project_Group5 Public

    Python

  2. MetroTrafficVolume MetroTrafficVolume Public

    Python

  3. orangekim28/FinalProject-Group8 orangekim28/FinalProject-Group8 Public

    Python 1

  4. CapstoneProject_WorldBank CapstoneProject_WorldBank Public

    HTML

  5. tanmayk26/T1-phoenix-22FA tanmayk26/T1-phoenix-22FA Public

    This is a repository for Team 1 for the DATS 6101 class - Introduction to Data Science

    HTML 1

  6. Nayaeun/Tamagotchi-Game-Project Nayaeun/Tamagotchi-Game-Project Public

    Python