Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
estebanzimanyi committed Jul 1, 2024
1 parent 1bdfe65 commit 444ff2c
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 32 deletions.
30 changes: 10 additions & 20 deletions develop/html/ch04.html
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Chapter 4. Managing GTFS Data</title><link rel="stylesheet" type="text/css" href="docbook.css"><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot"><link rel="home" href="index.html" title="MobilityDB Workshop"><link rel="up" href="index.html" title="MobilityDB Workshop"><link rel="prev" href="ch03s06.html" title="Complete Flight Data Business Intelligence Dashboard"><link rel="next" href="ch04s02.html" title="Transforming GTFS Data for MobilityDB"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 4. Managing GTFS Data</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="ch03s06.html">Prev</a> </td><th width="60%" align="center"> </th><td width="20%" align="right"> <a accesskey="n" href="ch04s02.html">Next</a></td></tr></table><hr></div><div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="GTFS"></a>Chapter 4. Managing GTFS Data</h1></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl class="toc"><dt><span class="sect1"><a href="ch04.html#idm1353">Loading GTFS Data in PostgreSQL</a></span></dt><dt><span class="sect1"><a href="ch04s02.html">Transforming GTFS Data for MobilityDB</a></span></dt></dl></div><p>The General Transit Feed Specification (GTFS) defines a common format for public transportation schedules and associated geographic information. GTFS-realtime is used to specify real-time transit data. Many transportation agencies around the world publish their data in GTFS and GTFS-realtime format and make them publicly available. A well-known repository containing such data is <a class="ulink" href="https://transitfeeds.com" target="_top">OpenMobilityData</a>.</p><p>In this chapter, we illustrate how to load GTFS data in MobilityDB. For this, we first need to import the GTFS data into PostgreSQL and then transform this data so that it can be loaded into MobilityDB. The data used in this tutorial is obtained from <a class="ulink" href="https://www.stib-mivb.be" target="_top">STIB-MIVB</a>, the Brussels public transportation company and is available as a <a class="ulink" href="https://github.com/MobilityDB/MobilityDB-workshop/data/gtfs_data.zip" target="_top">ZIP</a> file. You must be aware that GTFS data is typically of big size. In order to reduce the size of the dataset, this file only contains schedules for one week and five transportation lines, whereas typical GTFS data published by STIB-MIVB contains schedules for one month and 99 transportation lines. In the reduced dataset used in this tutorial the final table containing the GTFS data in MobilityDB format has almost 10,000 trips and its size is 241 MB. Furtheremore, we need several temporary tables to transform GTFS format into MobilityDB and these tables are also big, the largest one has almost 6 million rows and its size is 621 MB.</p><p>Several tools can be used to import GTFS data into PostgreSQL. For example, one publicly available in Github can be found <a class="ulink" href="https://github.com/fitnr/gtfs-sql-importer" target="_top">here</a>. These tools load GTFS data into PostgreSQL tables, allowing one to perform multiple imports of data provided by the same agency covering different time frames, perform various complex tasks including data validation, and take into account variations of the format provided by different agencies, updates of route information among multiple imports, etc. For the purpose of this tutorial we do a simple import and transformation using only SQL. This is enough for loading the data set we are using but a much more robust solution should be used in an operational environment, if only for coping with the considerable size of typical GTFS data, which would require parallelization of this task.</p><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="idm1353"></a>Loading GTFS Data in PostgreSQL</h2></div></div></div><p>The <a class="ulink" href="https://docs.mobilitydb.com/data/gtfs_data.zip" target="_top">ZIP</a> file with the data for this tutorial contains a set of CSV files (with extension <code class="varname">.txt</code>) as follows:
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="varname">agency.txt</code> contains the description of the transportation agencies provinding the services (a single one in our case).</p></li><li class="listitem"><p><code class="varname">calendar.txt</code> contains service patterns that operate recurrently such as, for example, every weekday.</p></li><li class="listitem"><p><code class="varname">calendar_dates.txt</code> define exceptions to the default service patterns defined in <code class="varname">calendar.txt</code>. There are two types of exceptions: 1 means that the service has been added for the specified date, and 2 means that the service has been removed for the specified date.</p></li><li class="listitem"><p><code class="varname">route_types.txt</code> contains transportation types used on routes, such as bus, metro, tramway, etc.</p></li><li class="listitem"><p><code class="varname">routes.txt</code> contains transit routes. A route is a group of trips that are displayed to riders as a single service.</p></li><li class="listitem"><p><code class="varname">shapes.txt</code> contains the vehicle travel paths, which are used to generate the corresponding geometry.</p></li><li class="listitem"><p><code class="varname">stop_times.txt</code> contains times at which a vehicle arrives at and departs from stops for each trip.</p></li><li class="listitem"><p><code class="varname">translations.txt</code> contains the translation of the route information in French and Dutch. This file is not used in this tutorial.</p></li><li class="listitem"><p><code class="varname">trips.txt</code> contains trips for each route. A trip is a sequence of two or more stops that occur during a specific time period.</p></li></ul></div><p>
</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="varname">agency.txt</code> contains the description of the transportation agencies provinding the services (a single one in our case).</p></li><li class="listitem"><p><code class="varname">calendar.txt</code> contains service patterns that operate recurrently such as, for example, every weekday.</p></li><li class="listitem"><p><code class="varname">calendar_dates.txt</code> define exceptions to the default service patterns defined in <code class="varname">calendar.txt</code>. There are two types of exceptions: 1 means that the service has been added for the specified date, and 2 means that the service has been removed for the specified date.</p></li><li class="listitem"><p><code class="varname">routes.txt</code> contains transit routes. A route is a group of trips that are displayed to riders as a single service.</p></li><li class="listitem"><p><code class="varname">shapes.txt</code> contains the vehicle travel paths, which are used to generate the corresponding geometry.</p></li><li class="listitem"><p><code class="varname">stop_times.txt</code> contains times at which a vehicle arrives at and departs from stops for each trip.</p></li><li class="listitem"><p><code class="varname">translations.txt</code> contains the translation of the route information in French and Dutch. This file is not used in this tutorial.</p></li><li class="listitem"><p><code class="varname">trips.txt</code> contains trips for each route. A trip is a sequence of two or more stops that occur during a specific time period.</p></li></ul></div><p>
</p><p>
We decompress the file with the data into a directory. This can be done using the command.
</p><pre class="programlisting">
Expand Down Expand Up @@ -31,7 +31,6 @@
end_date date NOT NULL,
CONSTRAINT calendar_pkey PRIMARY KEY (service_id)
);
CREATE INDEX calendar_service_id ON calendar (service_id);

CREATE TABLE exception_types (
exception_type int PRIMARY KEY,
Expand All @@ -43,19 +42,14 @@
date date NOT NULL,
exception_type int REFERENCES exception_types(exception_type)
);
CREATE INDEX calendar_dates_dateidx ON calendar_dates (date);

CREATE TABLE route_types (
route_type int PRIMARY KEY,
description text
);
CREATE INDEX calendar_dates_date_idx ON calendar_dates (date);

CREATE TABLE routes (
route_id text,
route_short_name text DEFAULT '',
route_long_name text DEFAULT '',
route_desc text DEFAULT '',
route_type int REFERENCES route_types(route_type),
route_type int,
route_url text,
route_color text,
route_text_color text,
Expand All @@ -68,15 +62,14 @@
shape_pt_lon double precision NOT NULL,
shape_pt_sequence int NOT NULL
);
CREATE INDEX shapes_shape_key ON shapes (shape_id);
CREATE INDEX shapes_shape_id_idx ON shapes (shape_id);

-- Create a table to store the shape geometries
CREATE TABLE shape_geoms (
shape_id text NOT NULL,
shape_geom geometry('LINESTRING', 4326),
shape_geom geometry('LINESTRING', 3857),
CONSTRAINT shape_geom_pkey PRIMARY KEY (shape_id)
);
CREATE INDEX shape_geoms_key ON shapes (shape_id);

CREATE TABLE location_types (
location_type int PRIMARY KEY,
Expand All @@ -92,9 +85,9 @@
stop_lon double precision,
zone_id text,
stop_url text,
location_type integer REFERENCES location_types(location_type),
location_type integer REFERENCES location_types(location_type),
parent_station integer,
stop_geom geometry('POINT', 4326),
stop_geom geometry('POINT', 3857),
platform_code text DEFAULT NULL,
CONSTRAINT stops_pkey PRIMARY KEY (stop_id)
);
Expand All @@ -106,7 +99,7 @@

CREATE TABLE stop_times (
trip_id text NOT NULL,
-- Check that casting to time interval works.
-- Check that casting to time interval works
arrival_time interval CHECK (arrival_time::interval = arrival_time::interval),
departure_time interval CHECK (departure_time::interval = departure_time::interval),
stop_id text,
Expand All @@ -129,7 +122,6 @@
shape_id text,
CONSTRAINT trips_pkey PRIMARY KEY (trip_id)
);
CREATE INDEX trips_trip_id ON trips (trip_id);

INSERT INTO exception_types (exception_type, description) VALUES
(1, 'service has been added'),
Expand Down Expand Up @@ -162,8 +154,6 @@
FROM '/home/gtfs_tutorial/trips.txt' DELIMITER ',' CSV HEADER;
COPY agency(agency_id,agency_name,agency_url,agency_timezone,agency_lang,agency_phone)
FROM '/home/gtfs_tutorial/agency.txt' DELIMITER ',' CSV HEADER;
COPY route_types(route_type,description)
FROM '/home/gtfs_tutorial/route_types.txt' DELIMITER ',' CSV HEADER;
COPY routes(route_id,route_short_name,route_long_name,route_desc,route_type,route_url,
route_color,route_text_color) FROM '/home/gtfs_tutorial/routes.txt' DELIMITER ','
CSV HEADER;
Expand All @@ -177,12 +167,12 @@
</p><pre class="programlisting">
INSERT INTO shape_geoms
SELECT shape_id, ST_MakeLine(array_agg(
ST_SetSRID(ST_MakePoint(shape_pt_lon, shape_pt_lat),4326) ORDER BY shape_pt_sequence))
ST_Transform(ST_Point(shape_pt_lon, shape_pt_lat, 4326), 3857) ORDER BY shape_pt_sequence))
FROM shapes
GROUP BY shape_id;

UPDATE stops
SET stop_geom = ST_SetSRID(ST_MakePoint(stop_lon, stop_lat),4326);
SET stop_geom = ST_Transform(ST_Point(stop_lon, stop_lat, 4326), 3857);
</pre><p>
The visualization of the routes and stops in QGIS is given in <a class="xref" href="ch04.html#stib" title="Figure 4.1. Visualization of the routes and stops for the GTFS data from Brussels.">Figure 4.1, “Visualization of the routes and stops for the GTFS data from Brussels.”</a>. In the figure, red lines correspond to the trajectories of vehicles, while orange points correspond to the location of stops.
</p><div class="figure-float"><div class="figure"><a name="stib"></a><p class="title"><b>Figure 4.1. Visualization of the routes and stops for the GTFS data from Brussels.</b></p><div class="figure-contents"><div class="mediaobject"><img src="images/stib.png" width="189" alt="Visualization of the routes and stops for the GTFS data from Brussels."></div></div></div><br class="figure-break"></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="ch03s06.html">Prev</a> </td><td width="20%" align="center"> </td><td width="40%" align="right"> <a accesskey="n" href="ch04s02.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Complete Flight Data Business Intelligence Dashboard </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Transforming GTFS Data for MobilityDB</td></tr></table></div></body></html>
25 changes: 13 additions & 12 deletions develop/html/ch04s02.html
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Transforming GTFS Data for MobilityDB</title><link rel="stylesheet" type="text/css" href="docbook.css"><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot"><link rel="home" href="index.html" title="MobilityDB Workshop"><link rel="up" href="ch04.html" title="Chapter 4. Managing GTFS Data"><link rel="prev" href="ch04.html" title="Chapter 4. Managing GTFS Data"><link rel="next" href="ch05.html" title="Chapter 5. Managing Google Location History"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Transforming GTFS Data for MobilityDB</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="ch04.html">Prev</a> </td><th width="60%" align="center">Chapter 4. Managing GTFS Data</th><td width="20%" align="right"> <a accesskey="n" href="ch05.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="idm1409"></a>Transforming GTFS Data for MobilityDB</h2></div></div></div><p>
We start by creating a table that contains couples of <code class="varname">service_id</code> and <code class="varname">date</code> defining the dates at which a service is provided.
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Transforming GTFS Data for MobilityDB</title><link rel="stylesheet" type="text/css" href="docbook.css"><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot"><link rel="home" href="index.html" title="MobilityDB Workshop"><link rel="up" href="ch04.html" title="Chapter 4. Managing GTFS Data"><link rel="prev" href="ch04.html" title="Chapter 4. Managing GTFS Data"><link rel="next" href="ch05.html" title="Chapter 5. Managing Google Location History"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Transforming GTFS Data for MobilityDB</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="ch04.html">Prev</a> </td><th width="60%" align="center">Chapter 4. Managing GTFS Data</th><td width="20%" align="right"> <a accesskey="n" href="ch05.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="idm1406"></a>Transforming GTFS Data for MobilityDB</h2></div></div></div><p>
We start by creating a table that contains couples of <code class="varname">service_id</code> and <code class="varname">date</code> defining the dates at which a service is provided.
</p><pre class="programlisting">
DROP TABLE IF EXISTS service_dates;
CREATE TABLE service_dates AS (
SELECT service_id, date_trunc('day', d)::date AS date
FROM calendar c, generate_series(start_date, end_date, '1 day'::interval) AS d
WHERE (
(monday = 1 AND extract(isodow FROM d) = 1) OR
(tuesday = 1 AND extract(isodow FROM d) = 2) OR
(wednesday = 1 AND extract(isodow FROM d) = 3) OR
(thursday = 1 AND extract(isodow FROM d) = 4) OR
(friday = 1 AND extract(isodow FROM d) = 5) OR
(saturday = 1 AND extract(isodow FROM d) = 6) OR
(sunday = 1 AND extract(isodow FROM d) = 7)
(monday = 1 AND extract(isodow FROM d) = 1) OR
(tuesday = 1 AND extract(isodow FROM d) = 2) OR
(wednesday = 1 AND extract(isodow FROM d) = 3) OR
(thursday = 1 AND extract(isodow FROM d) = 4) OR
(friday = 1 AND extract(isodow FROM d) = 5) OR
(saturday = 1 AND extract(isodow FROM d) = 6) OR
(sunday = 1 AND extract(isodow FROM d) = 7)
)
EXCEPT
SELECT service_id, date
Expand Down Expand Up @@ -48,8 +48,8 @@

UPDATE trip_stops t
SET perc = CASE
WHEN stop_sequence = 1 then 0.0
WHEN stop_sequence = no_stops then 1.0
WHEN stop_sequence = 1 THEN 0.0
WHEN stop_sequence = no_stops THEN 1.0
ELSE ST_LineLocatePoint(g.shape_geom, s.stop_geom)
END
FROM shape_geoms g, stops s
Expand Down Expand Up @@ -177,7 +177,8 @@
);

INSERT INTO trips_mdb(trip_id, service_id, route_id, date, trip)
SELECT trip_id, service_id, route_id, date, tgeompoint_seq(array_agg(tgeompoint_inst(point_geom, t) ORDER BY T))
SELECT trip_id, service_id, route_id, date, tgeompointSeq(array_agg(
tgeompoint(point_geom, t) ORDER BY T))
FROM trips_input
GROUP BY trip_id, service_id, route_id, date;

Expand Down
Binary file modified develop/mobilitydb-workshop.epub
Binary file not shown.
Binary file modified develop/mobilitydb-workshop.pdf
Binary file not shown.

0 comments on commit 444ff2c

Please sign in to comment.