-
Notifications
You must be signed in to change notification settings - Fork 10
Intermediate data storage
Somewhere between scraping data from the API and processing trips into a retrospective GTFS format, all of the actual, original schedule and vehicle location data from the API is stored in the PostGIS database.
The *_trips
table contains all reported vehicle locations and timestamps collected from the API. These are, naturally, segmented into trips using the method described here. Locations of vehicles in each trip are stored in a locally projected linestring field called orig_geom
(original geometry). The corresponding report times are stored in a field called times
, an array of double precision numbers, one for each point in the line. Times are in seconds since the epoch, UTC, not corrected for local time zones. It's generally possible to reconstruct the original data from the API by just exploding the line into points and associating them with the corresponding times and trip attributes.
One exception to this however is that extremely short trips with just a few GPS records may not be stored.
This is pretty straightforwardly stored in the *_directions
and *_stops
tables. The report time column indicates the first time that exactly this record was reported. Any subsequent changes in the schedule will initiate a new entry with the appropriate timestamp. It is assumed that the last-reported schedule stays in effect until replaced by new data.