Welcome to the YouTube Adview Count Analysis repository! This project aims to predict YouTube ad view counts using machine learning models. Below you will find detailed descriptions of the files, models, and datasets used in this project, along with instructions for setting up and running the code.
This repository contains a collection of machine learning models and datasets used to analyze and predict ad view counts on YouTube. The project demonstrates various machine learning techniques, including linear regression, decision trees, and ensemble methods, to build effective prediction models.
Here is a brief description of the files and directories in this repository:
File/Directory | Description |
---|---|
artificial_neural_network.keras |
A Keras model file containing the trained Artificial Neural Network model used for predicting YouTube ad view counts. |
decision_tree.pkl |
A pickle file containing the trained Decision Tree model for predicting YouTube ad view counts. |
linear_regression.pkl |
A pickle file containing the trained Linear Regression model for predicting YouTube ad view counts. |
random_forest.pkl |
A pickle file containing the trained Random Forest model for predicting YouTube ad view counts. |
svr.pkl |
A pickle file containing the trained Support Vector Regression (SVR) model for predicting YouTube ad view counts. |
train.csv |
CSV file containing the training dataset with features and target values for model training. |
test.csv |
CSV file containing the test dataset used for evaluating the performance of the models. |
predictedAdview.csv |
CSV file containing the predicted ad view counts from the models. |
internship.ipynb |
Jupyter Notebook containing the code and analysis for the YouTube ad view count prediction project. |
README.md |
This file, providing an overview of the project, file descriptions, and instructions. |
- Description: The training dataset used for training the machine learning models.
- Format: CSV
- Columns: Features and target values for training.
- Description: The test dataset used to evaluate the performance of the machine learning models.
- Format: CSV
- Columns: Features for testing the models.
- Description: A CSV file containing the predicted ad view counts generated by the models.
- Format: CSV
- Columns: Predicted ad view counts.
- Description: A Keras model file for an Artificial Neural Network.
- Purpose: Used for predicting YouTube ad view counts.
- Requirements: Keras and TensorFlow libraries.
- Description: A pickle file for the Decision Tree model.
- Purpose: Used for predicting YouTube ad view counts.
- Requirements: scikit-learn library.
- Description: A pickle file for the Linear Regression model.
- Purpose: Used for predicting YouTube ad view counts.
- Requirements: scikit-learn library.
- Description: A pickle file for the Random Forest model.
- Purpose: Used for predicting YouTube ad view counts.
- Requirements: scikit-learn library.
- Description: A pickle file for the Support Vector Regression (SVR) model.
- Purpose: Used for predicting YouTube ad view counts.
- Requirements: scikit-learn library.
To get started with this project, follow these steps:
git clone https://github.com/hitesh-bhatnagar/YouTube-Adview-count-.git
cd YouTube-Adview-count-
Make sure you have the required libraries installed:
pip install -r requirements.txt
You can load the datasets using pandas:
import pandas as pd
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')
To load the saved models:
import joblib
decision_tree_model = joblib.load('decision_tree.pkl')
linear_regression_model = joblib.load('linear_regression.pkl')
random_forest_model = joblib.load('random_forest.pkl')
svr_model = joblib.load('svr.pkl')
To make predictions using the models:
predictions = {
'decision_tree': decision_tree_model.predict(test_data),
'linear_regression': linear_regression_model.predict(test_data),
'random_forest': random_forest_model.predict(test_data),
'svr': svr_model.predict(test_data)
}
Use the test dataset to evaluate the models’ performance:
from sklearn.metrics import mean_squared_error
for model_name, preds in predictions.items():
mse = mean_squared_error(test_data['target'], preds)
print(f'{model_name} Mean Squared Error: {mse}')
To use the models and datasets:
- Load the
train.csv
andtest.csv
datasets. - Load the pre-trained models from their respective
.pkl
files or theartificial_neural_network.keras
file. - Use the models to make predictions and evaluate their performance.
Contributions are welcome! If you would like to contribute to this project, please fork the repository and submit a pull request with your changes. Ensure that your changes follow the guidelines outlined in the CONTRIBUTING.md
file.
This project is licensed under the MIT License.
Make sure to include a requirements.txt
file in your repository with the necessary dependencies:
numpy==1.21.2
pandas==1.3.3
scikit-learn==1.0.2
tensorflow==2.9.1
joblib==1.1.0
# Contributing to YouTube Adview Count Analysis
Thank you for your interest in contributing to the YouTube Adview Count Analysis project! We welcome contributions to improve the project.
## How to Contribute
1. **Fork the repository** to your own GitHub account.
2. **Clone the forked repository** to your local machine.
```bash
git clone https://github.com/your-username/YouTube-Adview-count-.git
- Create a new branch for your changes.
git checkout -b my-new-feature
- Make your changes and commit them.
git add . git commit -m "Add some feature"
- Push your changes to your forked repository.
git push origin my-new-feature
- Submit a pull request to the
main
branch of the original repository.
- Ensure that your code follows the project’s style guidelines.
- Provide a clear and concise description of your changes.
- Include tests for new features or bug fixes.
Thank you for your contributions!