-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push-based model for consuming (realtime) GBFS data #630
Comments
This is great proposal and and I think this would be very interesting for aggregators and their consumers. I also think that the best cost/benefit ratio would be to have a some form of HTTP-based event system, like Websockets. |
The problem with WebSocket and Server-Sent Events are that it still requires a non-native implementation as a backend. Having a single (preferably well standardized) interface like MQTT (ISO/IEC 20922:2016) gives in my opinion a much better standardisation effort. That having said, it would require a topic structure, that allows for partial updates. In addition, because retained information remains information, it also supports connecting to a server and get back the clean state. As producer we are willing to provide an MQTT implementation for evaluation. |
Thanks @leonardehrenfried and @skinkie, I think it's great that we have some opposing views here. @skinkie could you perhaps elaborate what you mean by "non-native implementation", because I didn't quite understand the argument. |
Imagine you would need a scalable solution for distribution. Internally that will be a publish-subscribe-pattern. Websockets and SSE are web technologies and not per se the transport protocol used within an enterprise grade publish-subscribe system. Surely you could run your own protocol over websockets, including MQTT, but why not go for the native route? Given the experience we have with "GTFS-RT Differential" and implementing websockets because it was mentioned as being a standardised webtechnology, has in the past ten years not resulted in any operational commercial GTFS-RT client. My personal preference would be going for MQTT, since other transport organisations such as VDV (Germany) have also embraced MQTT in favor of their own distribution protocols. Our own implementation is using ZeroMQ, so it is also not that we are pushing our own choices. |
Thanks for your insight @skinkie |
Thx @testower for bringing this up. |
@matt-wirtz VDV-435-IoM. |
This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 30 days if no further activity occurs. Thank you for your contributions. |
Keep open. |
This will be the topic of a workshop at the Mobility Data Summit in Montreal, Oct 30-31 2024. Please contact me at [email protected] if you have any questions about the Summit. cc @skinkie @leonardehrenfried @matt-wirtz for visibility |
What is the issue and why is it an issue?
Using poll-based consumption (the current situation) for real-time data has several challenges.
ttl
(time-to-live) value of 0. Ideally, this means consumers should poll infinitely often, to stay up-to-date.Consumer side
There is an inherent conflict within this decision process: Consumers don’t want to poll too infrequently, because that increases the likelihood that data will be stale and that incorrect information is shown to users.
At the same time, polling too frequently is a potential waste of resources, depending on how often data is refreshed. They may also face rate-limiting policies from producers (I have first-hand experience with this).
In the end, we have to decide between over-fetching and stale data, and it will never be better than a mere compromise.
Producer side
Frequent polling of large-size payloads hogs resources and pushes producers to introduce complexity like caching and CDNs. Having consumers poll at an interval close to 0 seconds is resource-intensive and costly for the data producer, and they face the risk of lost revenue if consumers poll too in-frequently.
We must further consider that large-size payloads often only contains minor changes to the totality of the information, causing an additional waste of resources as non-changes have to be computed.
Cloud computing contributes to greenhouse gas emissions on a massive scale. Allocated resources are generally underutilised and unnecessary computing is extremely wasteful on the financial side, as well as damaging on the environmental side.
Potential solutions
I would like to open up for a community discussion on how to solve this challenge by generic and scalable means. Individual arrangements between consumers and producers are not sustainable and finding a common solution will benefit the community as whole and help the standard grow.
I don’t want to constrain the solutions from the outset, but I think potential solutions fall into the following 3 broad categories:
Personally, I think the second category holds the right trade-off between added complexity and added value. In particular Server-Sent Events seems to be promising as a theoretical extension of existing endpoints. It should also be noted that options 1 and 2 can co-exist. I.e. producers can continue to support the polling-based method for real time feeds, and improve upon it, while at the same time support a push-based model.
Still, there is another axis to consider: For any given update, what is the size of the delta of that update. There is potentially a very large upside to precompute and only ship what has actually changed, rather than always transferring everything. On the other hand, it requires us to introduce new semantics to communicate to consumers the contents of the delta. E.g. what has been added, what has changed and what was removed.
I’m looking forward to hearing what the community has to say about this. I will use your feedback to work on a proposal for a standard way to deal with the problems outlined here.
Is your potential solution a breaking change?
The text was updated successfully, but these errors were encountered: