taxi-trips-predictions

Description

The task is to create a machine learning model that predicts the count of taxi trips for the next hour in Chicago's community areas. The utilization of this model could help to optimize taxi driver workload distribution across different locations. This is a time series task.

Datasets with their respective descriptions can be found by the following links: https://data.cityofchicago.org/Transportation/Taxi-Trips-2022/npd7-ywjz
https://data.cityofchicago.org/Transportation/Taxi-Trips-2023/e55j-2ewb

This repository contains:

A notebook with machine learning model predicting the count of taxi trips.
Corresponding code for time series feature generation from pandas dataframe.
A shell script to create the cluster of Docker containers to run the PySpark code.

Setup (from command line)

To run the notebook, you need Docker installed. When done, run:

$ sh start_local_cluster.sh

Access the local cluster by the address you get in the terminal window. Add the SPARK_MASTER_IP variable that you get in the terminal to the .ipynb file, cell 8.

Tools used

Docker
PySpark
Pandas
LightGBM
Catboost

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

taxi-trips-predictions

Description

Setup (from command line)

Tools used

Files

README.md

Latest commit

History

README.md

File metadata and controls

taxi-trips-predictions

Description

Setup (from command line)

Tools used