Push-based model for consuming (realtime) GBFS data #630

testower · 2024-04-22T11:54:35Z

What is the issue and why is it an issue?

Using poll-based consumption (the current situation) for real-time data has several challenges.

The specification states that real-time feeds should be updated as often as possible.
- This means they should have a ttl (time-to-live) value of 0. Ideally, this means consumers should poll infinitely often, to stay up-to-date.
- This is, of course, not possible. Consumers will necessarily poll on a finite interval. Choosing the appropriate interval will depend on various factors, mostly related to available computing and bandwidth resources, as well as the total number of feeds that the consumer needs to poll.

Consumer side

There is an inherent conflict within this decision process: Consumers don’t want to poll too infrequently, because that increases the likelihood that data will be stale and that incorrect information is shown to users.

At the same time, polling too frequently is a potential waste of resources, depending on how often data is refreshed. They may also face rate-limiting policies from producers (I have first-hand experience with this).

In the end, we have to decide between over-fetching and stale data, and it will never be better than a mere compromise.

Producer side

Frequent polling of large-size payloads hogs resources and pushes producers to introduce complexity like caching and CDNs. Having consumers poll at an interval close to 0 seconds is resource-intensive and costly for the data producer, and they face the risk of lost revenue if consumers poll too in-frequently.

We must further consider that large-size payloads often only contains minor changes to the totality of the information, causing an additional waste of resources as non-changes have to be computed.

Cloud computing contributes to greenhouse gas emissions on a massive scale. Allocated resources are generally underutilised and unnecessary computing is extremely wasteful on the financial side, as well as damaging on the environmental side.

Potential solutions

I would like to open up for a community discussion on how to solve this challenge by generic and scalable means. Individual arrangements between consumers and producers are not sustainable and finding a common solution will benefit the community as whole and help the standard grow.

I don’t want to constrain the solutions from the outset, but I think potential solutions fall into the following 3 broad categories:

Continue to use a polling-based model but encourage better use of cache headers and not-modified responses.
Use a push-based model without an intermediary, with technologies like WebSocket or Server-Sent Events
Use a push-based model using an intermediary message broker, with technologies like amqp, pub/sub, kafka, mqtt etc.

Personally, I think the second category holds the right trade-off between added complexity and added value. In particular Server-Sent Events seems to be promising as a theoretical extension of existing endpoints. It should also be noted that options 1 and 2 can co-exist. I.e. producers can continue to support the polling-based method for real time feeds, and improve upon it, while at the same time support a push-based model.

Still, there is another axis to consider: For any given update, what is the size of the delta of that update. There is potentially a very large upside to precompute and only ship what has actually changed, rather than always transferring everything. On the other hand, it requires us to introduce new semantics to communicate to consumers the contents of the delta. E.g. what has been added, what has changed and what was removed.

I’m looking forward to hearing what the community has to say about this. I will use your feedback to work on a proposal for a standard way to deal with the problems outlined here.

Is your potential solution a breaking change?

Yes
No
Unsure

The text was updated successfully, but these errors were encountered:

leonardehrenfried · 2024-05-22T08:39:46Z

This is great proposal and and I think this would be very interesting for aggregators and their consumers.

I also think that the best cost/benefit ratio would be to have a some form of HTTP-based event system, like Websockets.

skinkie · 2024-05-22T08:46:03Z

The problem with WebSocket and Server-Sent Events are that it still requires a non-native implementation as a backend. Having a single (preferably well standardized) interface like MQTT (ISO/IEC 20922:2016) gives in my opinion a much better standardisation effort. That having said, it would require a topic structure, that allows for partial updates. In addition, because retained information remains information, it also supports connecting to a server and get back the clean state.

As producer we are willing to provide an MQTT implementation for evaluation.

testower · 2024-05-22T12:02:43Z

Thanks @leonardehrenfried and @skinkie, I think it's great that we have some opposing views here.

@skinkie could you perhaps elaborate what you mean by "non-native implementation", because I didn't quite understand the argument.

skinkie · 2024-05-22T12:13:08Z

@skinkie could you perhaps elaborate what you mean by "non-native implementation", because I didn't quite understand the argument.

Imagine you would need a scalable solution for distribution. Internally that will be a publish-subscribe-pattern. Websockets and SSE are web technologies and not per se the transport protocol used within an enterprise grade publish-subscribe system. Surely you could run your own protocol over websockets, including MQTT, but why not go for the native route?

Given the experience we have with "GTFS-RT Differential" and implementing websockets because it was mentioned as being a standardised webtechnology, has in the past ten years not resulted in any operational commercial GTFS-RT client. My personal preference would be going for MQTT, since other transport organisations such as VDV (Germany) have also embraced MQTT in favor of their own distribution protocols. Our own implementation is using ZeroMQ, so it is also not that we are pushing our own choices.

testower · 2024-05-27T19:35:35Z

Thanks for your insight @skinkie

matt-wirtz · 2024-07-04T08:57:19Z

Thx @testower for bringing this up.
As a consumer we do have the problem of stale data from time to time, as described already: Users might be looking for vehicles which we still show as available. Or the opposite the option of a shared vehicle is not shown in trip search results because the newly available vehicle is not yet part of our data.
So we would also be interested in a good, push based approach. Using MQTT as transport mechanism also sounds reasonable.
@skinkie could you provide an example where MQTT is already applied in a VDV defined interface?

skinkie · 2024-07-04T09:29:23Z

@matt-wirtz VDV-435-IoM.

mobilitydataio · 2024-09-03T04:06:59Z

This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 30 days if no further activity occurs. Thank you for your contributions.

skinkie · 2024-09-03T06:11:14Z

Keep open.

richfab · 2024-09-26T08:40:12Z

This will be the topic of a workshop at the Mobility Data Summit in Montreal, Oct 30-31 2024.
We will discuss possible technical developments.
Workshop title: Achieving Optimal Efficiency in GBFS Data Exchanges.

Please contact me at [email protected] if you have any questions about the Summit.

cc @skinkie @leonardehrenfried @matt-wirtz for visibility

testower mentioned this issue May 27, 2024

Restful API for filtering and pagination #617

Closed

3 tasks

matt-wirtz mentioned this issue Jul 16, 2024

Future availability of all vehicles in the system #616

Open

3 tasks

mobilitydataio added the stale label Sep 3, 2024

mobilitydataio removed the stale label Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push-based model for consuming (realtime) GBFS data #630

Push-based model for consuming (realtime) GBFS data #630

testower commented Apr 22, 2024 •

edited

Loading

leonardehrenfried commented May 22, 2024

skinkie commented May 22, 2024 •

edited

Loading

testower commented May 22, 2024

skinkie commented May 22, 2024

testower commented May 27, 2024

matt-wirtz commented Jul 4, 2024 •

edited

Loading

skinkie commented Jul 4, 2024

mobilitydataio commented Sep 3, 2024

skinkie commented Sep 3, 2024

richfab commented Sep 26, 2024

Push-based model for consuming (realtime) GBFS data #630

Push-based model for consuming (realtime) GBFS data #630

Comments

testower commented Apr 22, 2024 • edited Loading

What is the issue and why is it an issue?

Consumer side

Producer side

Potential solutions

Is your potential solution a breaking change?

leonardehrenfried commented May 22, 2024

skinkie commented May 22, 2024 • edited Loading

testower commented May 22, 2024

skinkie commented May 22, 2024

testower commented May 27, 2024

matt-wirtz commented Jul 4, 2024 • edited Loading

skinkie commented Jul 4, 2024

mobilitydataio commented Sep 3, 2024

skinkie commented Sep 3, 2024

richfab commented Sep 26, 2024

testower commented Apr 22, 2024 •

edited

Loading

skinkie commented May 22, 2024 •

edited

Loading

matt-wirtz commented Jul 4, 2024 •

edited

Loading