Skip to content

Intermediate data storage

Nate Wessel edited this page May 13, 2019 · 2 revisions

Somewhere between scraping data from the API and processing trips into a retrospective GTFS format, all of the actual, original schedule and vehicle location data from the API is stored in the PostGIS database.

Vehicle Location Data

The *_trips table contains all reported vehicle locations and timestamps collected from the API. These are, naturally, segmented into trips using the method described here. Locations of vehicles in each trip are stored in a locally projected linestring field called orig_geom (original geometry). The corresponding report times are stored in a field called times, an array of double precision numbers, one for each point in the line. Times are in seconds since the epoch, UTC, not corrected for local time zones. It's generally possible to reconstruct the original data from the API by just exploding the line into points and associating them with the corresponding times and trip attributes. One exception to this however is that extremely short trips with just a few GPS records may not be stored.

Schedule Data

This is pretty straightforwardly stored in the *_directions and *_stops tables. The report time column indicates the first time that exactly this record was reported. Any subsequent changes in the schedule will initiate a new entry with the appropriate timestamp. It is assumed that the last-reported schedule stays in effect until replaced by new data.

Clone this wiki locally