Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



23 Commits

Repository files navigation

ML Models for Project Rainfall


Predicting possible rainfall of today, tomorrow and the day after tomorrow of the requested location from client


Wooyong Jeong
[email protected]
Wooyong Jeong: Woo
AI Developer

Tech Stack

  1. Development
    • Model Implementation
      • Google Colab
      • Google Drive
      • Scikit Learn
      • Pandas
      • Numpy
    • Model Server Implementation
      • FastAPI
      • Joblib
      • Pydantic
  2. Deployment
    • EC2
    • Nginx: Proxy and HTTPS(Let's Encrypt and CertBot)
    • Docker

Service Architecture: ML Model

Service Architecture

ML Models API Doc

ML Models API Docs

Project Structure: ML Model

├── .venv
├── app/
│   ├── core/
│   │   └──
│   ├── crud/
│   │   ├──
│   │   └──
│   ├── project_enum/
│   │   └──
│   ├── router/
│   │   └──
│   ├── schema/
│   │   ├──
│   │   └──
│   ├── trained_models/
│   │   └── ..._esm_model.pkl
│   └──
├── static
├── .env
├── .gitignore
├── Dockerfile
└── requirements.txt

Classification Model: Woo

Model Screenshots

Raining Not Raining
Raining Not Raining

Model Training Process & Hypothesis

  • Hypothesis
    1. Previous three days of today might have characteristics of today's weather
    2. Thus, highly correlated features to rainfall of previous three days can be used as features to specify whether it's going to be raining or not in the future.
  • Training Process
    1. Data acquisition and preprocessing: data_preprocessing_assign.ipynb
      1. API Request to JejuDataHub to get weather data
      2. Remove invalid dataset under the conditions
        • If the dataset has not been updated for a month
        • If the dataset hasn't been recorded for at least 8 years
    2. ML model training: ensemble_model_jeju.ipynb
      1. Use corr() to find correlated features from dataset
      2. Train a Ensemble Model with Voting feature
        • SVM
        • KNN
        • Logistic Regression

Classification Process

  1. Returning 4 days of classification based on location
    • Request Structure(Request DTO)

      # schema/
      from pydantic import BaseModel
      class PredictionRequestDto(BaseModel):
          obs_code : int
    • Response Structure(Response DTO)

      # schema/
      class PredictionResponseDto(BaseModel):
          obs_name : str
          predicted_date : str
          raining_status : bool
      class PredictionsResponseDto(BaseModel):
          data: List[PredictionResponseDto]
    • Method

      # routers/
      router = APIRouter()
   "/predict/classify", response_model=PredictionsResponseDto)
      def read_classifications(request: PredictionRequestDto):
          return get_classification(request = request)
      # crud/
      def get_prediction(request):
          # find obs
          # update with latest data
          # get model
          # generate prediction
          return {"data": predictions_list}
    • Response

          "data": [
                  "obs_name": "마라도",
                  "predicted_date": "20240725",
                  "raining_status": true,
                  "raining_amount": 0
                  "obs_name": "마라도",
                  "predicted_date": "20240726",
                  "raining_status": true,
                  "raining_amount": 0
                  "obs_name": "마라도",
                  "predicted_date": "20240727",
                  "raining_status": true,
                  "raining_amount": 0
                  "obs_name": "마라도",
                  "predicted_date": "20240728", 
                  "raining_status": true,
                  "raining_amount": 0