diff --git a/.env.example b/.env.example
new file mode 100644
index 0000000..d28a0ee
--- /dev/null
+++ b/.env.example
@@ -0,0 +1 @@
+PYTHONPATH=src
\ No newline at end of file
diff --git a/.gitignore b/.gitignore
index 297a10f..ecf8921 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,16 +1,51 @@
+# Data files
 *.pkl
 *.csv
 !ceas_08.csv
 !phishing_email.csv
 *.eml
 *.npz
+
+# Environment files
 *.env
+env/
+venv/
+ENV/
+
+# Log files
 *.log
+
+# Compiled files
+*.pyc
+*.pyo
+*.pyd
+
+# Python cache
 __pycache__/
+
+# Project directories
 backup/
 trash/
 others/
+downloads/
+
+# Configuration files
 config.json
 
+# Python build files
+build/
+develop-eggs/
+dist/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Specific scripts
 email_generator.py
-shap_analysis.py
\ No newline at end of file
+shap_analysis.py
diff --git a/README.md b/README.md
index b970213..dd47ec2 100644
--- a/README.md
+++ b/README.md
@@ -12,96 +12,14 @@ This project leverages advanced machine learning algorithms to detect and classi
 
 Our solution applies a combination of processes such as data preprocessing, feature engineering, and model training techniques to identify spam and phishing emails. The project addresses real-world challenges like imbalanced datasets by utilizing SpamAssassin and CEAS datasets for training and evaluation, ultimately enhancing the model's ability to filter phishing and spam emails effectively.
 
-## Key Technologies
+### Key Technologies
 
-- **BERT for Feature Extraction**: Enhances contextual understanding of email content.
-- **Stacked Ensemble Learning**: Combines XGBoost, Bagged SVM, and Logistic Regression for robust detection.
-- **Optuna for Hyperparameter Tuning**: Optimizes model performance by fine-tuning key parameters.
-- **Flask**: Provides a web interface for real-time email classification.
-
-## Installation
-
-To set up the project, clone the repository and install the necessary dependencies:
-
-```bash
-git clone https://github.com/Koon-Kiat/Detecting-Spam-and-Phishing-Emails-Using-Machine-Learning
-cd Detecting-Spam-and-Phishing-Emails-Using-Machine-Learning
-conda create --name <your_environment_name> python=3.8.20
-conda activate <your_environment_name>
-conda env update --file environment.yaml --prune
-```
-
-Once the dependencies are installed, you can run the phishing email detection program using the following command:
-
-```bash
-python main.py
-```
-
-## Data
-
-The project utilizes merged datasets from SpamAssassin (Hugging Face) and CEAS (Kaggle) to enhance email threat detection:
-
-- **SpamAssassin**: Contains real-world spam and legitimate emails.
-- **CEAS 2008**: Specially curated for anti-spam research, with a focus on phishing examples.
-
-## Merging Datasets
-
-TThe project integrates the **[Spam Assassin](https://huggingface.co/datasets/talby/spamassassin)** and **[CEAS 2008](https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset?select=CEAS_08.csv)** datasets, aligning them by columns and ensuring label consistency. This creates a robust, well-labeled dataset that improves phishing and spam detection accuracy.
-
-## File Structure for Storing Results
-
-```plaintext
-project_root/
-├── additional_model_training/
-│   ├── base_model_optuna.py
-│   ├── base_model.py
-├── config.json
-├── data_pipeline/
-│   ├── data_integration/
-│   ├── data_preprocessing/
-│   ├── noise_injection/
-│   ├── data_splitting/
-│   ├── feature_engineering/
-│   ├── feature_extraction/
-│   └── models_and_parameters/
-├── datasets/
-├── evaluation_on_third_dataset.py
-├── evaluationonthirddataset/
-├── flask_app.py
-├── logs/
-├── main.py
-├── manifest_python.xml
-├── multi_model_evaluation/
-├── README.md
-├── requirements.txt
-├── single_model_evaluation/
-├── spamandphishingdetection/
-├── static/
-├── templates/
-├── third_dataset_evaluation/
-```
-
-## Technology Stack
-
-### Programming Languages
-
-- **Python**
-
-### Libraries and Frameworks
-
-- **Machine Learning**: scikit-learn, TensorFlow, transformers, imbalanced-learn
+- **Programming Language**: Python
+- **ML/DL Libraries**: scikit-learn, TensorFlow, transformers, imbalanced-learn
 - **NLP**: NLTK
-- **Data Handling**: pandas, numpy
-- **Web Framework**: Flask
+- **Data Processing**: pandas, numpy
+- **Development Tools**: Git, Anaconda
 - **Optimization**: Optuna
-
-### Tools
-
-- **Version Control**: Git
-- **Environment Management**: Anaconda
-
-### Additional Technologies
-
 - **Feature Extraction**: BERT
 - **Ensemble Learning**: XGBoost, Bagged SVM, Logistic Regression, Stacked Ensemble Learning
 - **Data Preprocessing**: One-Hot Encoding, Standard Scaling, Imputation, Rare Category Removal, Noise Injection
@@ -110,8 +28,21 @@ project_root/
 - **Noise Injection**: Adding controlled random variations to features to improve model generalization and reduce overfitting
 - **Stacked Ensemble Learning**: Combining multiple models for robust detection
 
+## Features
+
+- **Advanced Spam and Phishing Detection**: Utilizes sophisticated algorithms to accurately identify malicious emails.
+- **Support for Handling Imbalanced Datasets**: Implements techniques to manage and balance skewed data distributions.
+- **Automated Model Training and Evaluation**: Streamlines the process of training and assessing machine learning models.
+
 ## Methodologies
 
+### Data Sources
+
+The project utilizes merged datasets from SpamAssassin (Hugging Face) and CEAS (Kaggle) to enhance email threat detection:
+
+- **SpamAssassin**: Contains real-world spam and legitimate emails.
+- **CEAS 2008**: Specially curated for anti-spam research, with a focus on phishing examples.
+
 ### Data Preprocessing
 
 - **Cleaning**: Removing duplicates, handling missing values, and correcting errors.
@@ -146,7 +77,7 @@ project_root/
 - **Confusion Matrix**: Displaying the performance of each model.
 - **Learning Curves**: Visualizing model performance as a function of training data size.
 
-These results are stored in the `data_pipeline` folder.
+These results are stored in the `output` folder.
 
 ## Evaluation
 
@@ -155,6 +86,25 @@ These results are stored in the `data_pipeline` folder.
 - **Confusion Matrix**: Displays the performance of each model in predicting "Safe" vs. "Not Safe" emails.
 - **Learning Curve**: A plot showing model performance (accuracy/loss) as a function of training data size, helping to visualize overfitting, underfitting, and the effectiveness of adding more training data.
 
+
+## Installation
+
+To set up the project, clone the repository and install the necessary dependencies:
+
+```bash
+git clone https://github.com/Koon-Kiat/Detecting-Spam-and-Phishing-Emails-Using-Machine-Learning
+cd Detecting-Spam-and-Phishing-Emails-Using-Machine-Learning
+conda create --name <your_environment_name> python=3.8.20
+conda activate <your_environment_name>
+conda env update --file environment.yaml --prune
+```
+
+Once the dependencies are installed, you can run the phishing email detection program using the following command:
+
+```bash
+python main.py
+```
+
 ### Example Output
 
 ```
@@ -184,26 +134,12 @@ Classification Report for Test Data:
 weighted avg      0.XX      0.XX      0.XX      XX
 ```
 
-## Flask Application
-
-The accompanying Flask application provides a user-friendly interface where users can input email content for real-time spam and phishing detection. The system returns an analysis of whether an email is "Safe" or "Not Safe."
-
-### Key Features:
-
-- **User Interface**:
-
-  - The main interface is provided by `index.html` and `taskpane.html` located in the `templates` folder.
-  - Users can upload or paste email content for evaluation.
-
-- **Instant Feedback**:
-
-  - The `/evaluateEmail` endpoint processes the email content and returns immediate results, flagging malicious content.
-  - This endpoint utilizes the `single_model_evaluation` module for classification.
+## License
 
-- **Integration**:
-  - The Flask app communicates with the machine learning model backend for classification.
-  - Static assets such as icons are served from the `static/assets` folder.
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
 
-### Example Usage:
+## Acknowledgments
 
-To evaluate an email, users can navigate to the main interface, input the email content, and submit it for evaluation. The system will process the input and provide instant feedback on whether the email is "Safe" or "Not Safe."
+- SpamAssassin Public Corpus
+- CEAS 2008 Dataset Contributors
+- Open Source ML Community
diff --git a/enviornment.yaml b/config/enviornment.yaml
similarity index 81%
rename from enviornment.yaml
rename to config/enviornment.yaml
index 8ae7651..ceef7a0 100644
--- a/enviornment.yaml
+++ b/config/enviornment.yaml
@@ -28,6 +28,11 @@ dependencies:
   - tabulate=0.9.0 
   - pip
   - pip:
-      - -r requirements.txt
+      - matplotlib==3.7.5
+      - seaborn==0.13.2
+      - wordcloud==1.9.3
+      - contractions==0.1.73
+      - optuna==4.0.0
+      - pyspellchecker==0.8.1
       - datasets==3.0.2 --upgrade
       - https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
\ No newline at end of file
diff --git a/datasets/ceas_08.csv b/data/ceas_08.csv
similarity index 100%
rename from datasets/ceas_08.csv
rename to data/ceas_08.csv
diff --git a/datasets/phishing_email.csv b/data/phishing_email.csv
similarity index 100%
rename from datasets/phishing_email.csv
rename to data/phishing_email.csv
diff --git a/manifest_python.xml b/extensions/manifest_python.xml
similarity index 100%
rename from manifest_python.xml
rename to extensions/manifest_python.xml
diff --git a/main.py b/main.py
index 635b9b5..cc401c6 100644
--- a/main.py
+++ b/main.py
@@ -16,8 +16,8 @@
 from imblearn.over_sampling import SMOTE  # Handling imbalanced data
 import tensorflow as tf  # TensorFlow library
 from bs4 import MarkupResemblesLocatorWarning  # HTML and XML parsing
-from datasets import load_dataset  # Load datasets
-from spamandphishingdetection import (
+from datasets import load_dataset  # Load datasets\
+from src.spamandphishingdetection import (
     initialize_environment,
     load_config,
     get_file_paths,
@@ -53,7 +53,7 @@
 def main():
     nlp, loss_fn = initialize_environment(__file__)
 
-    config = load_config("config.json")
+    config = load_config("config/config.json")
     file_paths = get_file_paths(config)
 
     # Load the datasets
@@ -200,7 +200,7 @@ def main():
         # ************************* #
         logging.info(f"Beginning Data Cleaning ['body']...")
         df_clean_body = load_or_clean_data(
-            'Merged Dataframe', combined_df, 'body', "data_pipeline/data_cleaning/cleaned_data_frame.csv", data_cleaning)
+            'Merged Dataframe', combined_df, 'body', "output/main_model_evaluation/data_cleaning/cleaned_data_frame.csv", data_cleaning)
 
         # Verifying the Cleaned Combine DataFrame
         # Concatenate the Cleaned DataFrame with the Merged DataFrame
@@ -218,7 +218,7 @@ def main():
         # ***************************** #
         logging.info(f"Beginning Noise Injection...")
         noisy_df = generate_noisy_dataframe(
-            df_cleaned_combined, 'data_pipeline/noise_injection/noisy_data_frame.csv')
+            df_cleaned_combined, 'output/main_model_evaluation/noise_injection/noisy_data_frame.csv')
         logging.info(f"Noise Injection completed.\n")
 
         # ************************* #
@@ -285,7 +285,7 @@ def main():
                 y_train=y_train,
                 y_test=y_test,
                 pipeline=pipeline,
-                dir='data_pipeline/feature_extraction',
+                dir='output/main_model_evaluation/feature_extraction',
             )
             logging.info(
                 f"Data for Fold {fold_idx} has been processed or loaded successfully.\n")
diff --git a/additional_model_training/stacked_models/params/XGB_ADA_LG_Best_Params_Fold_1.json b/output/additional_models/stacked_models/params/XGB_ADA_LG_Best_Params_Fold_1.json
similarity index 100%
rename from additional_model_training/stacked_models/params/XGB_ADA_LG_Best_Params_Fold_1.json
rename to output/additional_models/stacked_models/params/XGB_ADA_LG_Best_Params_Fold_1.json
diff --git a/additional_model_training/stacked_models/params/XGB_KNN_LG_Best_Params_Fold_1.json b/output/additional_models/stacked_models/params/XGB_KNN_LG_Best_Params_Fold_1.json
similarity index 100%
rename from additional_model_training/stacked_models/params/XGB_KNN_LG_Best_Params_Fold_1.json
rename to output/additional_models/stacked_models/params/XGB_KNN_LG_Best_Params_Fold_1.json
diff --git a/additional_model_training/stacked_models/params/XGB_LightGB_LG_Best_Params_Fold_1.json b/output/additional_models/stacked_models/params/XGB_LightGB_LG_Best_Params_Fold_1.json
similarity index 100%
rename from additional_model_training/stacked_models/params/XGB_LightGB_LG_Best_Params_Fold_1.json
rename to output/additional_models/stacked_models/params/XGB_LightGB_LG_Best_Params_Fold_1.json
diff --git a/additional_model_training/stacked_models/params/XGB_RF_LG_Best_Params_Fold_1.json b/output/additional_models/stacked_models/params/XGB_RF_LG_Best_Params_Fold_1.json
similarity index 100%
rename from additional_model_training/stacked_models/params/XGB_RF_LG_Best_Params_Fold_1.json
rename to output/additional_models/stacked_models/params/XGB_RF_LG_Best_Params_Fold_1.json
diff --git a/data_pipeline/models_and_parameters/Best_Parameter_Fold_1.json b/output/main_model_evaluation/models_and_parameters/Best_Parameter_Fold_1.json
similarity index 100%
rename from data_pipeline/models_and_parameters/Best_Parameter_Fold_1.json
rename to output/main_model_evaluation/models_and_parameters/Best_Parameter_Fold_1.json
diff --git a/requirements.txt b/requirements.txt
index ad956eb..ef3d959 100644
Binary files a/requirements.txt and b/requirements.txt differ
diff --git a/additional_model_training/base_model.py b/scripts/base_model.py
similarity index 97%
rename from additional_model_training/base_model.py
rename to scripts/base_model.py
index cfcda28..05bb0d8 100644
--- a/additional_model_training/base_model.py
+++ b/scripts/base_model.py
@@ -60,7 +60,9 @@
 
 def main():
     nlp, loss_fn = initialize_environment(__file__)
-    config = load_config("config.json")
+    config_path = os.path.normpath(os.path.join(
+        os.path.dirname(__file__), '..', 'config', 'config.json'))
+    config = load_config(config_path)
     file_paths = get_file_paths(config)
 
     # Load the datasets
@@ -207,7 +209,7 @@ def main():
         # ************************* #
         logging.info(f"Beginning Data Cleaning ['body']...")
         df_clean_body = load_or_clean_data(
-            'Merged Dataframe', combined_df, 'body', "data_pipeline/data_cleaning/cleaned_data_frame.csv", data_cleaning)
+            'Merged Dataframe', combined_df, 'body', "output/main_model_evaluation/data_cleaning/cleaned_data_frame.csv", data_cleaning)
 
         # Verifying the Cleaned Combine DataFrame
         # Concatenate the Cleaned DataFrame with the Merged DataFrame
@@ -225,7 +227,7 @@ def main():
         # ***************************** #
         logging.info(f"Beginning Noise Injection...")
         noisy_df = generate_noisy_dataframe(
-            df_cleaned_combined, 'data_pipeline/noise_injection/noisy_data_frame.csv')
+            df_cleaned_combined, 'output/main_model_evaluation/noise_injection/noisy_data_frame.csv')
         logging.info(f"Noise Injection completed.\n")
 
         # ************************* #
@@ -292,7 +294,7 @@ def main():
                 y_train=y_train,
                 y_test=y_test,
                 pipeline=pipeline,
-                dir='data_pipeline/feature_extraction',
+                dir='output/main_model_evaluation/feature_extraction',
             )
             logging.info(
                 f"Data for Fold {fold_idx} has been processed or loaded successfully.\n")
diff --git a/additional_model_training/base_model_optuna.py b/scripts/base_model_optuna.py
similarity index 97%
rename from additional_model_training/base_model_optuna.py
rename to scripts/base_model_optuna.py
index b55b6ed..9d4b3df 100644
--- a/additional_model_training/base_model_optuna.py
+++ b/scripts/base_model_optuna.py
@@ -59,7 +59,9 @@
 # Main processing function
 def main():
     nlp, loss_fn = initialize_environment(__file__)
-    config = load_config("config.json")
+    config_path = os.path.normpath(os.path.join(
+        os.path.dirname(__file__), '..', 'config', 'config.json'))
+    config = load_config(config_path)
     file_paths = get_file_paths(config)
 
     # Load the datasets
@@ -206,7 +208,7 @@ def main():
         # ************************* #
         logging.info(f"Beginning Data Cleaning ['body']...")
         df_clean_body = load_or_clean_data(
-            'Merged Dataframe', combined_df, 'body', "data_pipeline/data_cleaning/cleaned_data_frame.csv", data_cleaning)
+            'Merged Dataframe', combined_df, 'body', "output/main_model_evaluation/data_cleaning/cleaned_data_frame.csv", data_cleaning)
 
         # Verifying the Cleaned Combine DataFrame
         # Concatenate the Cleaned DataFrame with the Merged DataFrame
@@ -224,7 +226,7 @@ def main():
         # ***************************** #
         logging.info(f"Beginning Noise Injection...")
         noisy_df = generate_noisy_dataframe(
-            df_cleaned_combined, 'data_pipeline/noise_injection/noisy_data_frame.csv')
+            df_cleaned_combined, 'output/main_model_evaluation/noise_injection/noisy_data_frame.csv')
         logging.info(f"Noise Injection completed.\n")
 
         # ************************* #
@@ -291,7 +293,7 @@ def main():
                 y_train=y_train,
                 y_test=y_test,
                 pipeline=pipeline,
-                dir='data_pipeline/feature_extraction',
+                dir='output/main_model_evaluation/feature_extraction',
             )
             logging.info(
                 f"Data for Fold {fold_idx} has been processed or loaded successfully.\n")
diff --git a/evaluation_on_third_dataset.py b/scripts/evaluation_on_third_dataset.py
similarity index 97%
rename from evaluation_on_third_dataset.py
rename to scripts/evaluation_on_third_dataset.py
index 18d96f3..b72dc84 100644
--- a/evaluation_on_third_dataset.py
+++ b/scripts/evaluation_on_third_dataset.py
@@ -10,7 +10,8 @@
 import re  # Regular expressions
 from tqdm import tqdm  # Progress bar
 import joblib  # Joblib library
-from sklearn.metrics import accuracy_score, confusion_matrix, classification_report  # Evaluation metrics
+# Evaluation metrics
+from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
 from tabulate import tabulate  # Pretty-print tabular data
 from sklearn.compose import ColumnTransformer
 from sklearn.impute import SimpleImputer
@@ -30,7 +31,7 @@
 from typing import Dict, List, Union  # Type hints
 import pickle  # Serialization library
 from sklearn.decomposition import PCA  # Dimensionality reduction
-from spamandphishingdetection import (
+from src.spamandphishingdetection import (
     initialize_environment,
     DatasetProcessor,
     count_urls,
@@ -40,7 +41,7 @@
     BERTFeatureExtractor,
     BERTFeatureTransformer,
 )
-from evaluationonthirddataset import (
+from src.evaluationonthirddataset import (
     load_config,
     get_file_paths,
     load_or_extract_headers,
@@ -49,7 +50,6 @@
 )
 
 
-
 def main():
     nlp, loss_fn = initialize_environment(__file__)
     config = load_config()
@@ -134,8 +134,10 @@ def main():
     else:
         logging.info(
             "The number of rows in the Merge Evaluation Dataframe matches the Processed Evaluation Dataframe.")
-        merged_evaluation.to_csv(file_paths['merged_evaluation_file'], index=False)
-        logging.info(f"Data successfully saved to: {file_paths['merged_evaluation_file']}")
+        merged_evaluation.to_csv(
+            file_paths['merged_evaluation_file'], index=False)
+        logging.info(
+            f"Data successfully saved to: {file_paths['merged_evaluation_file']}")
     logging.info("Data Integration completed.\n")
 
     # ************************* #
diff --git a/additional_model_training/xgb_ada_lg.py b/scripts/xgb_ada_lg.py
similarity index 95%
rename from additional_model_training/xgb_ada_lg.py
rename to scripts/xgb_ada_lg.py
index 8de42c9..7b7a7b0 100644
--- a/additional_model_training/xgb_ada_lg.py
+++ b/scripts/xgb_ada_lg.py
@@ -53,8 +53,9 @@
 
 def main():
     nlp, loss_fn = initialize_environment(__file__)
-
-    config = load_config("config.json")
+    config_path = os.path.normpath(os.path.join(
+        os.path.dirname(__file__), '..', 'config', 'config.json'))
+    config = load_config(config_path)
     file_paths = get_file_paths(config)
 
     # Load the datasets
@@ -201,7 +202,7 @@ def main():
         # ************************* #
         logging.info(f"Beginning Data Cleaning ['body']...")
         df_clean_body = load_or_clean_data(
-            'Merged Dataframe', combined_df, 'body', "data_pipeline/data_cleaning/cleaned_data_frame.csv", data_cleaning)
+            'Merged Dataframe', combined_df, 'body', "output/main_model_evaluation/data_cleaning/cleaned_data_frame.csv", data_cleaning)
 
         # Verifying the Cleaned Combine DataFrame
         # Concatenate the Cleaned DataFrame with the Merged DataFrame
@@ -219,7 +220,7 @@ def main():
         # ***************************** #
         logging.info(f"Beginning Noise Injection...")
         noisy_df = generate_noisy_dataframe(
-            df_cleaned_combined, 'data_pipeline/noise_injection/noisy_data_frame.csv')
+            df_cleaned_combined, 'output/main_model_evaluation/noise_injection/noisy_data_frame.csv')
         logging.info(f"Noise Injection completed.\n")
 
         # ************************* #
@@ -286,7 +287,7 @@ def main():
                 y_train=y_train,
                 y_test=y_test,
                 pipeline=pipeline,
-                dir='data_pipeline/feature_extraction',
+                dir='output/main_model_evaluation/feature_extraction',
             )
             logging.info(
                 f"Data for Fold {fold_idx} has been processed or loaded successfully.\n")
@@ -295,13 +296,12 @@ def main():
             # ***************************************** #
             logging.info(
                 f"Beginning Model Training and Evaluation for Fold {fold_idx}...")
-            with open(os.path.join(os.path.dirname(__file__), '..', 'config.json')) as config_file:
-                config = json.load(config_file)
-                base_dir = config['base_dir']
+
+            base_dir = config['base_dir']
             # Train the model and evaluate the performance for each fold
             model_path = os.path.join(
-                base_dir, 'additional_model_training', 'stacked_models', f'XGB_ADA_LG_Fold_{fold_idx}.pkl')
-            params_path = os.path.join(base_dir, 'additional_model_training', 'stacked_models',
+                base_dir, 'output', 'additional_models', 'stacked_models', f'XGB_ADA_LG_Fold_{fold_idx}.pkl')
+            params_path = os.path.join(base_dir, 'output', 'additional_models', 'stacked_models',
                                        'params', f'XGB_ADA_LG_Best_Params_Fold_{fold_idx}.json')
             ensemble_model, test_accuracy = xgb_ada_lg_model_training(
                 X_train_balanced,
diff --git a/additional_model_training/xgb_knn_lg.py b/scripts/xgb_knn_lg.py
similarity index 95%
rename from additional_model_training/xgb_knn_lg.py
rename to scripts/xgb_knn_lg.py
index 2fbf14a..cfd58c3 100644
--- a/additional_model_training/xgb_knn_lg.py
+++ b/scripts/xgb_knn_lg.py
@@ -53,8 +53,9 @@
 
 def main():
     nlp, loss_fn = initialize_environment(__file__)
-
-    config = load_config("config.json")
+    config_path = os.path.normpath(os.path.join(
+        os.path.dirname(__file__), '..', 'config', 'config.json'))
+    config = load_config(config_path)
     file_paths = get_file_paths(config)
 
     # Load the datasets
@@ -201,7 +202,7 @@ def main():
         # ************************* #
         logging.info(f"Beginning Data Cleaning ['body']...")
         df_clean_body = load_or_clean_data(
-            'Merged Dataframe', combined_df, 'body', "data_pipeline/data_cleaning/cleaned_data_frame.csv", data_cleaning)
+            'Merged Dataframe', combined_df, 'body', "output/main_model_evaluation/data_cleaning/cleaned_data_frame.csv", data_cleaning)
 
         # Verifying the Cleaned Combine DataFrame
         # Concatenate the Cleaned DataFrame with the Merged DataFrame
@@ -219,7 +220,7 @@ def main():
         # ***************************** #
         logging.info(f"Beginning Noise Injection...")
         noisy_df = generate_noisy_dataframe(
-            df_cleaned_combined, 'data_pipeline/noise_injection/noisy_data_frame.csv')
+            df_cleaned_combined, 'output/main_model_evaluation/noise_injection/noisy_data_frame.csv')
         logging.info(f"Noise Injection completed.\n")
 
         # ************************* #
@@ -286,7 +287,7 @@ def main():
                 y_train=y_train,
                 y_test=y_test,
                 pipeline=pipeline,
-                dir='data_pipeline/feature_extraction',
+                dir='output/main_model_evaluation/feature_extraction',
             )
             logging.info(
                 f"Data for Fold {fold_idx} has been processed or loaded successfully.\n")
@@ -296,13 +297,12 @@ def main():
             # ***************************************** #
             logging.info(
                 f"Beginning Model Training and Evaluation for Fold {fold_idx}...")
-            with open(os.path.join(os.path.dirname(__file__), '..', 'config.json')) as config_file:
-                config = json.load(config_file)
-                base_dir = config['base_dir']
+            
+            base_dir = config['base_dir']
             # Train the model and evaluate the performance for each fold
             model_path = os.path.join(
-                base_dir, 'additional_model_training', 'stacked_models', f'XGB_KNN_LG_Fold_{fold_idx}.pkl')
-            params_path = os.path.join(base_dir, 'additional_model_training', 'stacked_models',
+                base_dir, 'output', 'additional_models', 'stacked_models', f'XGB_KNN_LG_Fold_{fold_idx}.pkl')
+            params_path = os.path.join(base_dir, 'output', 'additional_models', 'stacked_models',
                                        'params', f'XGB_KNN_LG_Best_Params_Fold_{fold_idx}.json')
             ensemble_model, test_accuracy = xgb_knn_lg_model_training(
                 X_train_balanced,
diff --git a/additional_model_training/xgb_lightgb_lg.py b/scripts/xgb_lightgb_lg.py
similarity index 95%
rename from additional_model_training/xgb_lightgb_lg.py
rename to scripts/xgb_lightgb_lg.py
index 70d87e1..58ecdbe 100644
--- a/additional_model_training/xgb_lightgb_lg.py
+++ b/scripts/xgb_lightgb_lg.py
@@ -53,8 +53,9 @@
 
 def main():
     nlp, loss_fn = initialize_environment(__file__)
-
-    config = load_config("config.json")
+    config_path = os.path.normpath(os.path.join(
+        os.path.dirname(__file__), '..', 'config', 'config.json'))
+    config = load_config(config_path)
     file_paths = get_file_paths(config)
 
     # Load the datasets
@@ -201,7 +202,7 @@ def main():
         # ************************* #
         logging.info(f"Beginning Data Cleaning ['body']...")
         df_clean_body = load_or_clean_data(
-            'Merged Dataframe', combined_df, 'body', "data_pipeline/data_cleaning/cleaned_data_frame.csv", data_cleaning)
+            'Merged Dataframe', combined_df, 'body', "output/main_model_evaluation/data_cleaning/cleaned_data_frame.csv", data_cleaning)
 
         # Verifying the Cleaned Combine DataFrame
         # Concatenate the Cleaned DataFrame with the Merged DataFrame
@@ -219,7 +220,7 @@ def main():
         # ***************************** #
         logging.info(f"Beginning Noise Injection...")
         noisy_df = generate_noisy_dataframe(
-            df_cleaned_combined, 'data_pipeline/noise_injection/noisy_data_frame.csv')
+            df_cleaned_combined, 'output/main_model_evaluation/noise_injection/noisy_data_frame.csv')
         logging.info(f"Noise Injection completed.\n")
 
         # ************************* #
@@ -286,7 +287,7 @@ def main():
                 y_train=y_train,
                 y_test=y_test,
                 pipeline=pipeline,
-                dir='data_pipeline/feature_extraction',
+                dir='output/main_model_evaluation/feature_extraction',
             )
             logging.info(
                 f"Data for Fold {fold_idx} has been processed or loaded successfully.\n")
@@ -297,12 +298,11 @@ def main():
             logging.info(
                 f"Beginning Model Training and Evaluation for Fold {fold_idx}...")
             # Train the model and evaluate the performance for each fold
-            with open(os.path.join(os.path.dirname(__file__), '..', 'config.json')) as config_file:
-                config = json.load(config_file)
-                base_dir = config['base_dir']
+
+            base_dir = config['base_dir']
             model_path = os.path.join(
-                base_dir, 'additional_model_training', 'stacked_models', f'XGB_LightGB_LG_Fold_{fold_idx}.pkl')
-            params_path = os.path.join(base_dir, 'additional_model_training', 'stacked_models',
+                base_dir, 'output', 'additional_models', 'stacked_models', f'XGB_LightGB_LG_Fold_{fold_idx}.pkl')
+            params_path = os.path.join(base_dir, 'output', 'additional_models', 'stacked_models', 'stacked_models',
                                        'params',  f'XGB_LightGB_LG_Best_Params_Fold_{fold_idx}.json')
             ensemble_model, test_accuracy = xgb_lightgb_lg_model_training(
                 X_train_balanced,
diff --git a/additional_model_training/xgb_rf_lg.py b/scripts/xgb_rf_lg.py
similarity index 95%
rename from additional_model_training/xgb_rf_lg.py
rename to scripts/xgb_rf_lg.py
index 496e88a..2bf208b 100644
--- a/additional_model_training/xgb_rf_lg.py
+++ b/scripts/xgb_rf_lg.py
@@ -53,8 +53,9 @@
 
 def main():
     nlp, loss_fn = initialize_environment(__file__)
-
-    config = load_config("config.json")
+    config_path = os.path.normpath(os.path.join(
+        os.path.dirname(__file__), '..', 'config', 'config.json'))
+    config = load_config(config_path)
     file_paths = get_file_paths(config)
 
     # Load the datasets
@@ -201,7 +202,7 @@ def main():
         # ************************* #
         logging.info(f"Beginning Data Cleaning ['body']...")
         df_clean_body = load_or_clean_data(
-            'Merged Dataframe', combined_df, 'body', "data_pipeline/data_cleaning/cleaned_data_frame.csv", data_cleaning)
+            'Merged Dataframe', combined_df, 'body', "output/main_model_evaluation/data_cleaning/cleaned_data_frame.csv", data_cleaning)
 
         # Verifying the Cleaned Combine DataFrame
         # Concatenate the Cleaned DataFrame with the Merged DataFrame
@@ -219,7 +220,7 @@ def main():
         # ***************************** #
         logging.info(f"Beginning Noise Injection...")
         noisy_df = generate_noisy_dataframe(
-            df_cleaned_combined, 'data_pipeline/noise_injection/noisy_data_frame.csv')
+            df_cleaned_combined, 'output/main_model_evaluation/noise_injection/noisy_data_frame.csv')
         logging.info(f"Noise Injection completed.\n")
 
         # ************************* #
@@ -286,7 +287,7 @@ def main():
                 y_train=y_train,
                 y_test=y_test,
                 pipeline=pipeline,
-                dir='data_pipeline/feature_extraction',
+                dir='output/main_model_evaluation/feature_extraction',
             )
             logging.info(
                 f"Data for Fold {fold_idx} has been processed or loaded successfully.\n")
@@ -298,12 +299,11 @@ def main():
                 f"Beginning Model Training and Evaluation for Fold {fold_idx}...")
             # Train the model and evaluate the performance for each fold
             # Train the model and evaluate the performance for each fold
-            with open(os.path.join(os.path.dirname(__file__), '..', 'config.json')) as config_file:
-                config = json.load(config_file)
-                base_dir = config['base_dir']
+
+            base_dir = config['base_dir']
             model_path = os.path.join(
-                base_dir, base_dir, 'additional_model_training', 'stacked_models', f'XGB_RF_LG_Fold_{fold_idx}.pkl')
-            params_path = os.path.join(base_dir, base_dir, 'additional_model_training', 'stacked_models',
+                base_dir, base_dir, 'output', 'additional_model', 'stacked_models', f'XGB_RF_LG_Fold_{fold_idx}.pkl')
+            params_path = os.path.join(base_dir, base_dir, 'output', 'additional_model', 'stacked_models',
                                        'params', f'XGB_RF_LG_Best_Params_Fold_{fold_idx}.json')
             ensemble_model, test_accuracy = xgb_rf_lg_model_training(
                 X_train_balanced,
diff --git a/spamandphishingdetection/file_operations.py b/spamandphishingdetection/file_operations.py
deleted file mode 100644
index 7ab2a1d..0000000
--- a/spamandphishingdetection/file_operations.py
+++ /dev/null
@@ -1,49 +0,0 @@
-import json
-import os
-
-
-def load_config(config_path='config.json'):
-    with open(config_path, 'r') as config_file:
-        config = json.load(config_file)
-    return config
-
-
-def ensure_directory_exists(path):
-    if not os.path.exists(path):
-        os.makedirs(path)
-
-
-def get_file_paths(config):
-    base_dir = config['base_dir']
-    file_paths = {
-        'ceas_08_dataset': os.path.join(base_dir, 'datasets', 'ceas_08.csv'),
-        'preprocessed_spam_assassin_file': os.path.join(base_dir, 'data_pipeline', 'data_preprocessing', 'preprocessed_spam_assassin.csv'),
-        'preprocessed_ceas_file': os.path.join(base_dir, 'data_pipeline', 'data_preprocessing', 'preprocessed_ceas_08.csv'),
-        'extracted_spam_assassin_email_header_file': os.path.join(base_dir, 'data_pipeline', 'feature_engineering', 'spam_assassin_extracted_email_header.csv'),
-        'extracted_ceas_email_header_file': os.path.join(base_dir, 'data_pipeline', 'feature_engineering', 'ceas_extracted_email_header.csv'),
-        'merged_spam_assassin_file': os.path.join(base_dir, 'data_pipeline', 'data_integration', 'merged_spam_assassin.csv'),
-        'merged_ceas_file': os.path.join(base_dir, 'data_pipeline', 'data_integration', 'merged_ceas_08.csv'),
-        'merged_data_frame': os.path.join(base_dir, 'data_pipeline', 'data_integration', 'merged_data_frame.csv'),
-        'cleaned_data_frame': os.path.join(base_dir, 'data_pipeline', 'data_cleaning', 'cleaned_data_frame.csv'),
-        'cleaned_ceas_headers': os.path.join(base_dir, 'data_pipeline', 'data_cleaning', 'cleaned_ceas_headers.csv'),
-        'merged_cleaned_ceas_headers': os.path.join(base_dir, 'data_pipeline', 'data_cleaning', 'merged_cleaned_ceas_headers.csv'),
-        'merged_cleaned_data_frame': os.path.join(base_dir, 'data_pipeline', 'data_cleaning', 'merged_cleaned_data_frame.csv'),
-        'noisy_data_frame': os.path.join(base_dir, 'data_pipeline', 'noise_injection', 'noisy_data_frame.csv'),
-        'pipeline_path': os.path.join(base_dir, 'data_pipeline', 'feature_extraction')
-    }
-
-    # Ensure directories exist
-    for path in file_paths.values():
-        ensure_directory_exists(os.path.dirname(path))
-
-    return file_paths
-
-
-def get_model_path(config, fold_idx):
-    base_dir = config['base_dir']
-    return os.path.join(base_dir, 'data_pipeline', 'models_and_parameters', f'Ensemble_Model_Fold_{fold_idx}.pkl')
-
-
-def get_params_path(config, fold_idx):
-    base_dir = config['base_dir']
-    return os.path.join(base_dir, 'data_pipeline', 'models_and_parameters', f'Best_Parameter_Fold_{fold_idx}.json')
diff --git a/evaluationonthirddataset/__init__.py b/src/evaluationonthirddataset/__init__.py
similarity index 100%
rename from evaluationonthirddataset/__init__.py
rename to src/evaluationonthirddataset/__init__.py
diff --git a/evaluationonthirddataset/feature_engineering.py b/src/evaluationonthirddataset/feature_engineering.py
similarity index 100%
rename from evaluationonthirddataset/feature_engineering.py
rename to src/evaluationonthirddataset/feature_engineering.py
diff --git a/evaluationonthirddataset/file_operations.py b/src/evaluationonthirddataset/file_operations.py
similarity index 91%
rename from evaluationonthirddataset/file_operations.py
rename to src/evaluationonthirddataset/file_operations.py
index b14f496..0b09d9f 100644
--- a/evaluationonthirddataset/file_operations.py
+++ b/src/evaluationonthirddataset/file_operations.py
@@ -16,7 +16,7 @@ def ensure_directory_exists(path):
 def get_file_paths(config):
     base_dir = config['base_dir']
     file_paths = {
-        'dataset': os.path.join(base_dir, 'datasets', 'phishing_email.csv'),
+        'dataset': os.path.join(base_dir, 'data', 'phishing_email.csv'),
         'preprocessed_evaluation_dataset': os.path.join(
             base_dir, 'third_dataset_evaluation', 'data_preprocessing', 'preprocessed_evaluation_dataset.csv'),
         'extracted_evaluation_header_file': os.path.join(
@@ -27,7 +27,7 @@ def get_file_paths(config):
             base_dir, 'third_dataset_evaluation', 'data_integration', 'merged_evaluation.csv'),
         'merged_cleaned_data_frame': os.path.join(
             base_dir, 'third_dataset_evaluation', 'data_cleaning', 'merged_cleaned_data_frame.csv'),
-        'main_model': os.path.join(base_dir, 'data_pipeline', 'models_and_parameters'),
+        'main_model': os.path.join(base_dir, 'output', 'models_and_parameters'),
         'base_model': os.path.join(
             base_dir, 'additional_model_training', 'base_models'),
         'base_model_optuna': os.path.join(
diff --git a/evaluationonthirddataset/pipeline.py b/src/evaluationonthirddataset/pipeline.py
similarity index 92%
rename from evaluationonthirddataset/pipeline.py
rename to src/evaluationonthirddataset/pipeline.py
index 3ef4ca4..e2b2533 100644
--- a/evaluationonthirddataset/pipeline.py
+++ b/src/evaluationonthirddataset/pipeline.py
@@ -5,13 +5,13 @@
 import joblib
 
 
-def save_data_pipeline(data, labels, data_path, labels_path):
+def save_output(data, labels, data_path, labels_path):
     np.savez(data_path, data=data)
     with open(labels_path, 'wb') as f:
         pickle.dump(labels, f)
 
 
-def load_data_pipeline(data_path, labels_path):
+def load_output(data_path, labels_path):
     data = np.load(data_path)['data']
     with open(labels_path, 'rb') as f:
         labels = pickle.load(f)
@@ -76,7 +76,7 @@ def run_pipeline_or_load(data, labels, pipeline, dir):
 
         # Save the preprocessed data
         logging.info("Saving processed data...")
-        save_data_pipeline(data_combined, labels, data_path, labels_path)
+        save_output(data_combined, labels, data_path, labels_path)
     else:
         # Load the preprocessor
         logging.info(f"Loading preprocessor from {preprocessor_path}...")
@@ -84,7 +84,7 @@ def run_pipeline_or_load(data, labels, pipeline, dir):
 
         # Load the preprocessed data
         logging.info("Loading preprocessed data...")
-        data_combined, labels = load_data_pipeline(data_path, labels_path)
+        data_combined, labels = load_output(data_path, labels_path)
 
     return data_combined, labels
 
diff --git a/spamandphishingdetection/__init__.py b/src/spamandphishingdetection/__init__.py
similarity index 99%
rename from spamandphishingdetection/__init__.py
rename to src/spamandphishingdetection/__init__.py
index 509c8d4..8fd1606 100644
--- a/spamandphishingdetection/__init__.py
+++ b/src/spamandphishingdetection/__init__.py
@@ -35,7 +35,6 @@
 from .pipeline import run_pipeline_or_load
 from .learning_curve import plot_learning_curve
 
-
 from .modeltraining.base_model import model_training as base_model_training
 from .modeltraining.main_model import model_training as main_model_training
 from .modeltraining.base_model_optuna import model_training as base_model_training_optuna
diff --git a/spamandphishingdetection/bert.py b/src/spamandphishingdetection/bert.py
similarity index 100%
rename from spamandphishingdetection/bert.py
rename to src/spamandphishingdetection/bert.py
diff --git a/spamandphishingdetection/data_cleaning.py b/src/spamandphishingdetection/data_cleaning.py
similarity index 100%
rename from spamandphishingdetection/data_cleaning.py
rename to src/spamandphishingdetection/data_cleaning.py
diff --git a/spamandphishingdetection/data_cleaning_headers.py b/src/spamandphishingdetection/data_cleaning_headers.py
similarity index 100%
rename from spamandphishingdetection/data_cleaning_headers.py
rename to src/spamandphishingdetection/data_cleaning_headers.py
diff --git a/spamandphishingdetection/data_integration.py b/src/spamandphishingdetection/data_integration.py
similarity index 100%
rename from spamandphishingdetection/data_integration.py
rename to src/spamandphishingdetection/data_integration.py
diff --git a/spamandphishingdetection/data_splitting.py b/src/spamandphishingdetection/data_splitting.py
similarity index 98%
rename from spamandphishingdetection/data_splitting.py
rename to src/spamandphishingdetection/data_splitting.py
index 22ade18..ff9d064 100644
--- a/spamandphishingdetection/data_splitting.py
+++ b/src/spamandphishingdetection/data_splitting.py
@@ -4,7 +4,7 @@
 from sklearn.model_selection import StratifiedKFold
 
 
-def stratified_k_fold_split(df, n_splits=3, random_state=42, output_dir='data_pipeline/data_splitting'):
+def stratified_k_fold_split(df, n_splits=3, random_state=42, output_dir='output/main_model_evaluation/data_splitting'):
     """
     Performs Stratified K-Fold splitting on the DataFrame.
 
diff --git a/spamandphishingdetection/dataset_processor.py b/src/spamandphishingdetection/dataset_processor.py
similarity index 100%
rename from spamandphishingdetection/dataset_processor.py
rename to src/spamandphishingdetection/dataset_processor.py
diff --git a/spamandphishingdetection/feature_engineering.py b/src/spamandphishingdetection/feature_engineering.py
similarity index 100%
rename from spamandphishingdetection/feature_engineering.py
rename to src/spamandphishingdetection/feature_engineering.py
diff --git a/src/spamandphishingdetection/file_operations.py b/src/spamandphishingdetection/file_operations.py
new file mode 100644
index 0000000..51038d1
--- /dev/null
+++ b/src/spamandphishingdetection/file_operations.py
@@ -0,0 +1,49 @@
+import json
+import os
+
+
+def load_config(config_path='config.json'):
+    with open(config_path, 'r') as config_file:
+        config = json.load(config_file)
+    return config
+
+
+def ensure_directory_exists(path):
+    if not os.path.exists(path):
+        os.makedirs(path)
+
+
+def get_file_paths(config):
+    base_dir = config['base_dir']
+    file_paths = {
+        'ceas_08_dataset': os.path.join(base_dir, 'data', 'ceas_08.csv'),
+        'preprocessed_spam_assassin_file': os.path.join(base_dir, 'output', 'main_model_evaluation', 'data_preprocessing', 'preprocessed_spam_assassin.csv'),
+        'preprocessed_ceas_file': os.path.join(base_dir, 'output', 'main_model_evaluation', 'data_preprocessing', 'preprocessed_ceas_08.csv'),
+        'extracted_spam_assassin_email_header_file': os.path.join(base_dir, 'output', 'main_model_evaluation', 'feature_engineering', 'spam_assassin_extracted_email_header.csv'),
+        'extracted_ceas_email_header_file': os.path.join(base_dir, 'output', 'main_model_evaluation', 'feature_engineering', 'ceas_extracted_email_header.csv'),
+        'merged_spam_assassin_file': os.path.join(base_dir, 'output', 'main_model_evaluation', 'data_integration', 'merged_spam_assassin.csv'),
+        'merged_ceas_file': os.path.join(base_dir, 'output', 'main_model_evaluation', 'data_integration', 'merged_ceas_08.csv'),
+        'merged_data_frame': os.path.join(base_dir, 'output', 'main_model_evaluation', 'data_integration', 'merged_data_frame.csv'),
+        'cleaned_data_frame': os.path.join(base_dir, 'output', 'main_model_evaluation', 'data_cleaning', 'cleaned_data_frame.csv'),
+        'cleaned_ceas_headers': os.path.join(base_dir, 'output', 'main_model_evaluation', 'data_cleaning', 'cleaned_ceas_headers.csv'),
+        'merged_cleaned_ceas_headers': os.path.join(base_dir, 'output', 'main_model_evaluation', 'data_cleaning', 'merged_cleaned_ceas_headers.csv'),
+        'merged_cleaned_data_frame': os.path.join(base_dir, 'output', 'main_model_evaluation', 'data_cleaning', 'merged_cleaned_data_frame.csv'),
+        'noisy_data_frame': os.path.join(base_dir, 'output', 'main_model_evaluation', 'noise_injection', 'noisy_data_frame.csv'),
+        'pipeline_path': os.path.join(base_dir, 'output', 'main_model_evaluation', 'feature_extraction')
+    }
+
+    # Ensure directories exist
+    for path in file_paths.values():
+        ensure_directory_exists(os.path.dirname(path))
+
+    return file_paths
+
+
+def get_model_path(config, fold_idx):
+    base_dir = config['base_dir']
+    return os.path.join(base_dir, 'output', 'main_model_evaluation', 'models_and_parameters', f'Ensemble_Model_Fold_{fold_idx}.pkl')
+
+
+def get_params_path(config, fold_idx):
+    base_dir = config['base_dir']
+    return os.path.join(base_dir, 'output', 'main_model_evaluation', 'models_and_parameters', f'Best_Parameter_Fold_{fold_idx}.json')
diff --git a/spamandphishingdetection/label_processing.py b/src/spamandphishingdetection/label_processing.py
similarity index 100%
rename from spamandphishingdetection/label_processing.py
rename to src/spamandphishingdetection/label_processing.py
diff --git a/spamandphishingdetection/learning_curve.py b/src/spamandphishingdetection/learning_curve.py
similarity index 100%
rename from spamandphishingdetection/learning_curve.py
rename to src/spamandphishingdetection/learning_curve.py
diff --git a/spamandphishingdetection/missing_values.py b/src/spamandphishingdetection/missing_values.py
similarity index 100%
rename from spamandphishingdetection/missing_values.py
rename to src/spamandphishingdetection/missing_values.py
diff --git a/spamandphishingdetection/modeltraining/__init__.py b/src/spamandphishingdetection/modeltraining/__init__.py
similarity index 100%
rename from spamandphishingdetection/modeltraining/__init__.py
rename to src/spamandphishingdetection/modeltraining/__init__.py
diff --git a/spamandphishingdetection/modeltraining/base_model.py b/src/spamandphishingdetection/modeltraining/base_model.py
similarity index 87%
rename from spamandphishingdetection/modeltraining/base_model.py
rename to src/spamandphishingdetection/modeltraining/base_model.py
index c43df41..a5f835b 100644
--- a/spamandphishingdetection/modeltraining/base_model.py
+++ b/src/spamandphishingdetection/modeltraining/base_model.py
@@ -5,12 +5,14 @@
 import json
 from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
 
-with open(os.path.join(os.path.dirname(__file__), '..', '..', 'config.json')) as config_file:
+with open(os.path.join(os.path.dirname(__file__), '..', '..', '..', 'config', 'config.json')) as config_file:
     config = json.load(config_file)
     base_dir = config['base_dir']
 
+
 # Define model_path
-model_path = os.path.join(base_dir, 'additional_model_training', 'base_models')
+model_path = os.path.join(base_dir, 'output', 'additional_models', 'base_models')
+os.makedirs(model_path, exist_ok=True)
 
 
 def model_training(X_train, y_train, X_test, y_test, model, model_name):
diff --git a/spamandphishingdetection/modeltraining/base_model_optuna.py b/src/spamandphishingdetection/modeltraining/base_model_optuna.py
similarity index 96%
rename from spamandphishingdetection/modeltraining/base_model_optuna.py
rename to src/spamandphishingdetection/modeltraining/base_model_optuna.py
index 667b4ec..01d1a59 100644
--- a/spamandphishingdetection/modeltraining/base_model_optuna.py
+++ b/src/spamandphishingdetection/modeltraining/base_model_optuna.py
@@ -11,15 +11,17 @@
 from sklearn.ensemble import RandomForestClassifier
 from sklearn.neighbors import KNeighborsClassifier
 from lightgbm import LGBMClassifier
+from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
 
-
-with open(os.path.join(os.path.dirname(__file__), '..', '..', 'config.json')) as config_file:
+with open(os.path.join(os.path.dirname(__file__), '..', '..', '..', 'config', 'config.json')) as config_file:
     config = json.load(config_file)
     base_dir = config['base_dir']
 
 # Define model_path and param_path
-model_path = os.path.join(base_dir, 'additional_model_training', 'base_models_optuna')
-param_path = os.path.join(base_dir, 'additional_model_training', 'base_models_optuna')
+model_path = os.path.join(
+    base_dir, 'output', 'additional_models', 'base_models_optuna')
+param_path = os.path.join(
+    base_dir, 'output', 'additional_models', 'base_models_optuna')
 
 
 def conduct_optuna_study(X_train, y_train, model_name):
diff --git a/spamandphishingdetection/modeltraining/main_model.py b/src/spamandphishingdetection/modeltraining/main_model.py
similarity index 100%
rename from spamandphishingdetection/modeltraining/main_model.py
rename to src/spamandphishingdetection/modeltraining/main_model.py
diff --git a/spamandphishingdetection/modeltraining/xgb_ada_lg_model.py b/src/spamandphishingdetection/modeltraining/xgb_ada_lg_model.py
similarity index 100%
rename from spamandphishingdetection/modeltraining/xgb_ada_lg_model.py
rename to src/spamandphishingdetection/modeltraining/xgb_ada_lg_model.py
diff --git a/spamandphishingdetection/modeltraining/xgb_knn_lg_model.py b/src/spamandphishingdetection/modeltraining/xgb_knn_lg_model.py
similarity index 100%
rename from spamandphishingdetection/modeltraining/xgb_knn_lg_model.py
rename to src/spamandphishingdetection/modeltraining/xgb_knn_lg_model.py
diff --git a/spamandphishingdetection/modeltraining/xgb_lightgb_lg_model.py b/src/spamandphishingdetection/modeltraining/xgb_lightgb_lg_model.py
similarity index 100%
rename from spamandphishingdetection/modeltraining/xgb_lightgb_lg_model.py
rename to src/spamandphishingdetection/modeltraining/xgb_lightgb_lg_model.py
diff --git a/spamandphishingdetection/modeltraining/xgb_rf_lg_model.py b/src/spamandphishingdetection/modeltraining/xgb_rf_lg_model.py
similarity index 100%
rename from spamandphishingdetection/modeltraining/xgb_rf_lg_model.py
rename to src/spamandphishingdetection/modeltraining/xgb_rf_lg_model.py
diff --git a/spamandphishingdetection/noise_injection.py b/src/spamandphishingdetection/noise_injection.py
similarity index 100%
rename from spamandphishingdetection/noise_injection.py
rename to src/spamandphishingdetection/noise_injection.py
diff --git a/spamandphishingdetection/pipeline.py b/src/spamandphishingdetection/pipeline.py
similarity index 82%
rename from spamandphishingdetection/pipeline.py
rename to src/spamandphishingdetection/pipeline.py
index 65d5bee..158ed63 100644
--- a/spamandphishingdetection/pipeline.py
+++ b/src/spamandphishingdetection/pipeline.py
@@ -122,10 +122,10 @@ def run_pipeline_or_load(fold_idx, X_train, X_test, y_train, y_test, pipeline, d
 
         # Save the preprocessed data
         logging.info(f"Saving processed data for fold {fold_idx}...")
-        save_data_pipeline(X_train_balanced, y_train_balanced,
-                           train_data_path, train_labels_path)
-        save_data_pipeline(X_test_combined, y_test,
-                           test_data_path, test_labels_path)
+        save_output(X_train_balanced, y_train_balanced,
+                    train_data_path, train_labels_path)
+        save_output(X_test_combined, y_test,
+                    test_data_path, test_labels_path)
     else:
         # Load the preprocessor
         logging.info(f"Loading preprocessor from {preprocessor_path}...")
@@ -133,15 +133,15 @@ def run_pipeline_or_load(fold_idx, X_train, X_test, y_train, y_test, pipeline, d
 
         # Load the preprocessed data
         logging.info(f"Loading preprocessed data for fold {fold_idx}...")
-        X_train_balanced, y_train_balanced = load_data_pipeline(
+        X_train_balanced, y_train_balanced = load_output(
             train_data_path, train_labels_path)
-        X_test_combined, y_test = load_data_pipeline(
+        X_test_combined, y_test = load_output(
             test_data_path, test_labels_path)
 
     return X_train_balanced, X_test_combined, y_train_balanced, y_test
 
 
-def save_data_pipeline(data, labels, data_path, labels_path):
+def save_output(data, labels, data_path, labels_path):
     """
     Save the data and labels to specified file paths.
 
@@ -163,7 +163,7 @@ def save_data_pipeline(data, labels, data_path, labels_path):
     dump(labels, labels_path)
 
 
-def load_data_pipeline(data_path, labels_path):
+def load_output(data_path, labels_path):
     """
     Load the data and labels from specified file paths.
 
@@ -188,7 +188,7 @@ def load_data_pipeline(data_path, labels_path):
     return data, labels
 
 
-def get_fold_paths(fold_idx, base_dir='feature_extraction'):
+def get_fold_paths(fold_idx, base_dir):
     """
     Generates file paths for the train and test data and labels for the specified fold.
 
@@ -204,13 +204,27 @@ def get_fold_paths(fold_idx, base_dir='feature_extraction'):
     tuple
         The file paths for the train data, test data, train labels, test labels, and preprocessor.
     """
-    train_data_path = os.path.join(base_dir, f"Fold_{fold_idx}_Train_Data.npz")
-    test_data_path = os.path.join(base_dir, f"Fold_{fold_idx}_Test_Data.npz")
-    train_labels_path = os.path.join(
-        base_dir, f"Fold_{fold_idx}_Train_Labels.pkl")
-    test_labels_path = os.path.join(
-        base_dir, f"Fold_{fold_idx}_Test_Labels.pkl")
-    preprocessor_path = os.path.join(
-        base_dir, f"Fold_{fold_idx}_Preprocessor.pkl")
+    train_data_path = os.path.normpath(os.path.join(
+        base_dir, f"Fold_{fold_idx}_Train_Data.npz"))
+    test_data_path = os.path.normpath(os.path.join(
+        base_dir, f"Fold_{fold_idx}_Test_Data.npz"))
+    train_labels_path = os.path.normpath(os.path.join(
+        base_dir, f"Fold_{fold_idx}_Train_Labels.pkl"))
+    test_labels_path = os.path.normpath(os.path.join(
+        base_dir, f"Fold_{fold_idx}_Test_Labels.pkl"))
+    preprocessor_path = os.path.normpath(os.path.join(
+        base_dir, f"Fold_{fold_idx}_Preprocessor.pkl"))
+
+    # Check if the files exist
+    if not os.path.exists(train_data_path):
+        logging.error(f"Train data file not found: {train_data_path}")
+    if not os.path.exists(test_data_path):
+        logging.error(f"Test data file not found: {test_data_path}")
+    if not os.path.exists(train_labels_path):
+        logging.error(f"Train labels file not found: {train_labels_path}")
+    if not os.path.exists(test_labels_path):
+        logging.error(f"Test labels file not found: {test_labels_path}")
+    if not os.path.exists(preprocessor_path):
+        logging.error(f"Preprocessor file not found: {preprocessor_path}")
 
     return train_data_path, test_data_path, train_labels_path, test_labels_path, preprocessor_path
diff --git a/spamandphishingdetection/rare_category_remover.py b/src/spamandphishingdetection/rare_category_remover.py
similarity index 100%
rename from spamandphishingdetection/rare_category_remover.py
rename to src/spamandphishingdetection/rare_category_remover.py
diff --git a/spamandphishingdetection/setup.py b/src/spamandphishingdetection/setup.py
similarity index 100%
rename from spamandphishingdetection/setup.py
rename to src/spamandphishingdetection/setup.py