-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topic Hierarchy Structure: The extreme complexity of the Topic hierarchy could potentially lead to a limited adoption of the service or very large performance issues #50
Comments
From my point of view, the topic hierarchy is not meant to help users find their data set. For this purpose there will be the WIS2 metadata. If i am not mistaken we had the discussion on the meaning of topics here wmo-im/wis2-guide#38 So if you search for data, use the Global Discovery Catalogue and the metadata will point you to the correct topic to subscribe to. Additionally it is perfectly fine to make much use of "+" and "#". The topic is for filtering. If you don't want to filter on a certain level of the topic hierarchy then use a wildcard. As Global Cache we subscribe to anything below origin/a/wis2/data/core/# which is what a cache is supposed to do. What i can not judge at the moment is the performance implications. MQTT browsers are made to handle lots of small messages distributed between lots of clients. IoT stuff. So millions of messages per timeframe should not be an issue. I can not yet say something about the impact of topics and filtering on the performance. In my opinion this is something tests and reallife need to show. To me it is clear, that there will be lots of tuning involved in the early days of WIS2. The decision on multi-purpose data sets in short is: Pick one topic that fits. One last remark: I think the sheer number of possible topics is not something that matters. The broker just needs to know which client wants which messages. And using wildcards in that perspective is preferred over creating thousands of individual subscriptions. |
I fully agree with Kai's comment above. The WIS2 Pilot Phase is exactly meant for those kinds of tests. I have also the feeling there might be some misunderstanding on how MQTT protocol works. We will not "create" the 62M topics. I don't think it will be even 1/100th of that number of topics. Topics are "created" when someone publishes/subscribes to it. Then, MQTT brokers are built, by design, to handle large number of subscribers and publishers. It is also interesting to notice that in one sizing tool of MQTT clusters (eg. https://www.emqx.com/en/server-estimate), what is important is the number of connected clients. And the number of messages, which is independent of the number of topics. MQTT Topic Hierarchy is also not meant to be used as a poor man discovery metadata. I also saw in other ETs/TTs some temptation/views, in creating many sublevels with many more topics. At the moment, we have 8 "global" levels. I don't think it is, by design, too much. We have to agree on the "right" balance between the level of filtering provided by the topics and a very coarse grain (only one topic) or very fine grain (A lot of levels of topics). "right" will obviously have a different meaning for all the experts :) Having discussed this kind of things (and our agreed topic hierarchy) with the main developer of one of the large MQTT broker, this was not considered as an issue The Sparplug standard (https://sparkplug.eclipse.org/specification/version/3.0/documents/sparkplug-specification-3.0.0.pdf) used for a typical IoT world defines one topic by IoT device. So they are really talking about millions here. |
@golfvert @kaiwirt Thanks for the comments. There is no misunderstanding on how MQTT works and it is clear that the topics are not "pre"-created but still if you look at the extract from AWS IOT Core, EMQX or HiveMQ, they are trying to really limit the number of topics created and limit the size of a topic hierarchy. I have attached my presentation with links and extracts from the different documentations. To make sure that the current topic setup is fine, it would be very valuable to perform a representative tests with a representative load for instance using the NWP discipline topic hierarchy duplicating it by 4 or 5 (discipline) and having a representative number of publishers (1000 or more) and consumers (10000 or more) using it to see how the brokers, publishers and consumers react and how each should be scaled/adapted. @golfvert you indicates that there is 8 levels at the moment which is perfectly fine but much more are going to be created below by the different disciplines (weather, hydrology, ....) . Right now, a lot of semantic is added in the topic hierarchy by each disciplines not necessarily for filtering purpose. There seem to be a misunderstanding on the topic hierarchy purpose and this should be communicated differently and the topic hierarchy should not extended as much as it is been done now. Another consequence is that it is creating a complex hierarchy: This is the second potential issue of the topic hierarchy definition: usability and making it easy to understand and use for users. If it is too complex (and this is my feeling with 8 top levels and then the disciplines) then users will ignore it and use wildcards to subscribe to almost everything. Then all this work done to define the full topic hierarchy that will create additionally some maintenance and infrastructure issues will have been done for little value and will be difficult to change. Hope that this issue will help converging toward a final topic hierarchy with one purpose: offering some filtering to avoid having consumers receiving too many messages while having a scalable manageable infrastructure. |
Thanks for this message. When we delegated the creation of the topic hierarchy to the various disciplines (typically NWP) we may have lacked some guidance... I also agree with the idea of a stress test. |
I hope that I am misunderstood something and that it should be resolved easily by updating my understanding of the WIS architecture but I have a couple of point to raise on the topic hierarchy.
I have been looking at the WIS2 topic hierarchy structure which is meant to be built for helping users finding datasets and filtering the data topics per subject. Thinking of it and how it could be implemented, it looks to me that its complexity will be a very large barrier to entry or it could lead to having users completely ignoring it.
Another point is that the topic hierarchy could lead to the implementation of a very complex system for the main broker reflecting the entire hierachy and in addition maintaining good performances could be extremelly challenging.
Below are the points that I have been trying to develop:
Large Discovery/Domain information in the topic hierarchy will be counter-productive in helping user understanding what data is available and how to find relevant data for users
A quick calculation taking the 8 first levels and assuming that we have around 195 countries and 20 centres per country in average (which is probably below the real number).
I end-up to 2x1x1x195x20x4x2x8 = 499200 branches for the 8 first levels and for the total tree taking 3 level of 5 sub discipline each: 2x1x1x195x20x4x2x8x5x5x5 = 62.400.000 topics. The assumption taken might be too large but reducing the problem by a factor 100 will lead to the same conclusion.
From the discovery/usability point of view, this is a large obstacle for users if the intention is to have them understanding the topic hierarchy and use it to find the data they are interested.
Users will most probably not find their way and might simply use + or # wildcards at many levels to receive some data.
They could then be overwhelmed by the number of messages received and the main brokers could be overloaded by such queries and the number of clients subscribing to many topics.
This is why I am questioning, the purpose of providing so much semantic and discovery information in the topic hierarchy and making it so deep.
Additionally, if the intention is to help users understanding what data is available why do we have 8 levels of technical (version, WIS2) and political information before the domain information ?
At least the topic hierarchy should be reversed but in my opinion, mostly simplified.
If the answer to the interrogations above is that the catalogue will provide the discovery services to find the data then there is no need to create such a complex topic hierarchy structure that will make the implementation very complex and challenging for the users.
Potential performance issues and challenges for implementation
Another point is performance of a system that will have to replicate and manage for distribution 62 Millions topics with some topics having a very high distribution frequency. This means that it is certainly leading to the implementation of a large scale system and tests of that scale should be performed to assess that the products on the market (HIVEMQ, RabbitMQ, Mosquitto, Amazon MQTT service) can cope easily with such scale.
It should also be noted that this complex hierarchy forces users to use wildcards (+, #) which will make the system to be created, even more demanding in term of resources (need of tables in memory, on disc, databases to resolve the wild cards and maintain the multi subscriptions or thousands of users).
Proposal for a way forward
I would propose to re-think the topic hierarchy and go back to the initial requirements:
How the topic hierarchy should be organise to focus on such requirement ?
Here are some leads that could help solving the issue and not leading to a difficult full scale implementation:
Another proposal would be to implement a large scale prototype simulating the load and number of topics to be created and reflected on the main brokers.
What do you think ? Comments ?
The text was updated successfully, but these errors were encountered: