This project focuses on addressing the last-mile connectivity problem in the Washington, D.C., Maryland, and Virginia (DMV) region by leveraging various datasets, including WMATA (Washington Metropolitan Area Transit Authority) Ridership Data, Capital Bikeshare Trip History Data, U.S. Census Bureau Data, and the Transit App API. The goal is to identify gaps in multimodal transportation options and propose solutions to improve public transport accessibility.
Link to the presentation: https://docs.google.com/presentation/d/1g2jAeqQMizMV6_oXJwlm74qR2guNTQHnsF2ofKPLGfE/edit?usp=sharing
- Introduction
- Objectives
- Problem Statement
- Locations Considered
- Data Sources
- Data Acquisition and Transformation
- Data Integration and Model
- Technology Stack and Architecture
- Data Visualization
- Athena Queries
- Insights and Recommendations
- Challenges and Lessons Learned
- Conclusion and Future Enhancements
- References
- Important Links
The "last-mile" connectivity problem refers to the challenges commuters face in reaching their final destinations from public transit stops. This project aims to address these issues in the DMV region by analyzing ridership patterns, bike-share usage, and demographic factors to propose actionable solutions.
- Data Integration: Combine multiple data sources to understand last-mile connectivity challenges comprehensively.
- Gap Identification: Pinpoint specific areas with insufficient transit and bikeshare infrastructure.
- Recommendations: Develop feasible solutions to bridge connectivity gaps.
Public transportation in the DMV region faces significant last-mile challenges, resulting in:
- Reduced transit system utilization.
- Increased dependency on private vehicles.
- Inequities in accessibility for underserved communities.
Key locations analyzed in the DMV region include:
- Washington, D.C.
- Bethesda, MD
- Silver Spring, MD
- Arlington, VA
- Alexandria, VA
- Rockville, MD
- Gaithersburg, MD
- Tysons, VA
- Fairfax, VA
- McLean, VA
- Falls Church, VA
- Vienna, VA
- College Park, MD
- Greenbelt, MD
- Hyattsville, MD
- Laurel, MD
- Bowie, MD
- Chantilly, VA
- Reston, VA
- Manassas, VA
- WMATA Ridership Data: Provides insights into public transit usage.
- Capital Bikeshare Data: Highlights bike-share trends and station demands.
- U.S. Census Bureau Data: Offers demographic and socioeconomic context.
- Transit App API: Supplies real-time transit data and stop information.
- Acquisition: Monthly CSV files downloaded from the Capital Bikeshare System Data portal.
- Purpose: Analyze trip patterns and identify high-demand areas for bike-sharing.
- Data Content: Includes start/end stations, trip durations, and user types.
- Acquisition: Accessed via the Transit API, providing real-time data on transit routes and stops.
- Purpose: Evaluate the proximity of transit options to bikeshare stations and identify underserved regions.
- Acquisition: Public data from the U.S. Census Data Portal.
- Purpose: Analyze socio-economic and demographic factors to assess public transportation needs.
- Acquisition: Open data from the WMATA Ridership Portal.
- Purpose: Analyze ridership data to identify high-traffic areas and assess connectivity gaps.
The datasets were integrated to create a comprehensive view of last-mile connectivity. Key steps involved geospatial analysis, data mapping, and combining datasets to identify proximity and demand patterns.
Target Data Model:
- Bus WMATA: Includes average ridership for different routes.
- Metro WMATA: Contains data on metro stations and ridership.
- Nearby Stops: Information on nearby transit stops.
- Shared Mobility: Data on bikeshare stations and their bike counts.
- US Census: Demographic breakdowns including commuting preferences and vehicle ownership.
- Capital Bikeshare: Information on trips, stations, and user types.
- AWS S3: Storage for raw and transformed data.
- AWS Glue: ETL processes for data cleaning and transformation.
- AWS Athena: Serverless querying of integrated datasets.
- AWS QuickSight: Data visualization and dashboard creation.
SELECT start_station_name, end_station_name, COUNT(ride_id) AS ride_count
FROM capital_bikeshare
WHERE start_station_name != ''
AND end_station_name != ''
GROUP BY start_station_name, end_station_name
ORDER BY ride_count DESC;
SELECT route, "average of weekday", "average of saturday", "average of sunday"
FROM bus_wmata
ORDER BY "average of weekday" DESC;
SELECT "station name",
AVG("avg daily tapped entries") AS avg_tapped_entries,
AVG("avg daily nontapped entries") AS avg_nontapped_entries
FROM metro_wmata
GROUP BY "station name";
SELECT location, sum(bike_count) as bike_count, sum(electric_bike_count) as electric_bike_count
FROM shared_mobility
GROUP BY location
ORDER BY bike_count DESC, electric_bike_count DESC;
SELECT nr.route_short_name, nr.location, sm.location, sum(sm.bike_count) as bike_count
FROM nearby_routes nr
JOIN shared_mobility sm
ON ST_Distance(ST_Point(nr.longitudes, nr.latitudes), ST_Point(sm.longitude, sm.latitude)) < 1000
GROUP BY 1,2,3
ORDER BY 4 DESC;
SELECT ns.stop_name, ns.location_name, sm.location, sum(sm.bike_count) as bike_count
FROM nearby_stops ns
JOIN shared_mobility sm
ON ST_Distance(ST_Point(ns.stop_lat, ns.stop_lon), ST_Point(sm.longitude, sm.latitude)) < 1000
GROUP BY 1,2,3
ORDER BY 4 DESC;
SELECT distinct REPLACE(label, '"','' ) as county_name,
max("drove alone") as "drove alone",
max("carpooled") as "carpooled",
max("employed population") as "employed population",
max("18 to 34 years") as "18 to 34 years",
max("35 to 64 years") as "35 to 64 years",
max("65 years and over") as "65 years and over"
FROM us_census
WHERE level = 'County'
GROUP BY 1
ORDER BY 2 DESC;
SELECT ns.*, sm.*
FROM "AwsDataCatalog"."lastmile-db"."nearby_stops" ns
JOIN "AwsDataCatalog"."lastmile-db"."shared_mobility" sm
ON ST_Distance(ST_Point(ns.stop_lat, ns.stop_lon), ST_Point(sm.longitude, sm.latitude)) < 1000;
- Ridership Gaps: Areas like Rockville lack sufficient bikeshare stations.
- High-Demand Areas: Metro stations in D.C. and Maryland could benefit from more bike-share allocations.
- Expand bikeshare stations in underserved areas.
- Improve coordination between bikeshare stations and metro stops.
- Optimize bike allocations at high-demand metro stations.
- Challenges: Addressing inconsistencies and missing data, managing AWS costs.
- Lessons Learned: Gained proficiency with AWS tools, improved collaboration.
- Developed a scalable data pipeline and delivered actionable insights.
- Future Enhancements:
- Integrating machine learning models for predictive analysis.
- Transitioning to AWS Redshift for persistent data storage.
- WMATA Open Data Hub: WMATA Ridership Data
- Capital Bikeshare System Data
- U.S. Census Bureau
- Transit App API
- All code and data are available in the GitHub repository.
- Presentation Slides