This project performs exploratory data analysis (EDA) and predictive modeling on retracted papers. The aim is to understand the characteristics of retracted papers and to develop models to predict retractions.
your-github-repo/
│
├── README.md
├── requirements.txt
│
├── data/
│ ├── raw/
│ └── processed/
│
├── src/
│ ├── eda/
│ │ └── EDA_retraction.py
│ ├── modeling/
│ │ ├── Preparation_modelling.py
│ │ ├── predictive_modelling_approach_2.py
│ │ ├── predictive_modeling_approach_3.py
│ │ └── confusion_matrix_random_forest.py
│
├── results/
│ ├── figures/
│ └── reports/
│
└── scripts/
├── run_modeling.py
└── run_all.py
- Python 3.6 or higher
- Git (optional, for cloning the repository)
-
Clone the Repository:
git clone https://github.com/your-username/your-repo.git cd your-repo
-
Create a Virtual Environment (Recommended):
-
Install Dependencies:
pip install -r requirements.txt
-
Run the python file: For example
python scripts/run_all.py
This script performs exploratory data analysis on the retracted papers dataset. It generates visualizations and descriptive statistics to understand the characteristics of the data. Before this data cleaning has been done.
This script prepares the data for modeling by preprocessing and transforming the dataset. It ensures the data is in the correct format for the predictive models.
check https://github.com/bibekdhakal/research-retraction
This script implements the second approach for predictive modeling. It trains and evaluates a machine learning model to predict retractions.
This script implements the third approach for predictive modeling. It trains and evaluates another machine learning model to predict retractions. Applying clustering techniques (e.g., K-Means) to group similar data points, thereby capturing underlying patterns in the data.
This script generates a confusion matrix for the Random Forest model. It evaluates the performance of the model and visualizes the results.
This script orchestrates the execution of all the key scripts in the correct order. It ensures that the entire workflow from data preparation to model evaluation is completed.
The results of the analysis and modeling are saved in the results
directory. This includes figures. Based on this a report has been made using Texmaker.
For any questions or issues, please contact at [email protected]