Hi, ✨I'm 21B-133-CS!✨ 👋
⚠️ The dataset is a dummy since I couldn't find an appropriate one. FAKE_DATASET_GENERATOR.py is how the dataset was made. ⚠️
-
It took
serial.py
14.138378620147705 seconds to run -
It took
parallel.py
10.72012186050415 seconds to run
This repository aims to compare the execution time between linear and parallel processing.
This project involves processing two CSV files: one containing student information and the other containing fee payment details. The goal is to ensure data integrity, generate fee payment dates, determine the most frequent payment date for each student, and implement both linear and parallel execution versions of the code to perform the tasks efficiently.
- Overview
- CSV File Structure
- Code Functionality 3.1 Linear Execution 3.2 Parallel Execution (Using multiprocessing)
- Performance Comparison
- Results
The project has three main tasks:
Ensure that each student in the students CSV has a corresponding fee payment in the student_fees CSV.
Generate a list of payment dates for each student based on monthly payments. Identify the most frequent payment date for each student.
Implement both linear execution and parallel execution (using the multiprocessing module) to handle the task, and compare their performance. The solution was implemented using Python and the pandas library for data manipulation.
In the linear execution approach, the program reads both the students.csv and student_fees.csv files, ensures each student appears in both files, generates a list of payment dates for each student, and determines the most frequent payment date for each student sequentially.
In the parallel execution approach, the program uses Python's multiprocessing library to parallelize the task of determining the most frequent payment date for each student. This significantly reduces processing time by utilizing multiple CPU cores.
The linear execution approach processes the data sequentially, whereas the parallel execution approach distributes the workload to multiple processes, which should ideally reduce execution time. The performance comparison is measured by timing the execution of both approaches using Python's time module.