-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding non-primitive API operations #376
Comments
Yes that's part of the goals, but we can't really know what these would look like in practice until we have well defined use-cases. I can see we will most likely need specialised API endpoints for a web dashboard use-case for example. The only "macro" query we have now is when posting a hierarchy of nodes, it will add them as a batch. I'm not sure what we can do this with particular GitHub issue to be honest, but we will definitely need to create specific ones for use-cases as we identify them. The first goals I believe are to ensure we're feature-complete for a stable release, then performance optimisation can happen on top of that to make the same features work "faster". In addition to this, we'll need to investigate the optimal database indexes, and generally speaking do some stress-testing with the AKS deployment to identify any performance bottlenecks in the entire system. |
That's ok, I can begin to define some use cases and use them to prototype an initial proof of concept. The case of "give me the whole node tree hanging from this one" is a pretty evident one that could be useful.
No problem, I'll create more speficic issues. The purpose of this one was to discuss about the general idea and to have a potential feature to put somewhere in the roadmap.
I agree, but all of that should be use-case-driven. So I think we should start by defining the operations and then the necessary DB and implementation changes needed to accommodate them. |
I believe we can already do this with the existing queries actually using common attributes for the whole set of nodes (e.g. same kernel revision or job/group name). And using recursive ID look-ups is not going to play well with MongoDB, it's not designed for this kind of thing. |
While I think extending the API for performance with macro operations is driven by use cases (like in my first comment), stress-testing the system can already be done with the primitive operations we have now. Also, it's a big advantage to not have many API endpoints. So adding macro operations on the API side should be carefully thought through to compare the actual added value with the costs of making the API more complex. |
What if we keep in all child nodes value similar to path, but with parent node ids? node:"aaaa" node:"bbbb" node:"cccc" |
That's already the case. |
planned to be implemented or implemented? |
@nuclearcat Oh sorry I see what you mean - currently the path uses the node names, but I think you're suggesting to also have that with node object IDs as they're unique? Yes I guess that could work, but it would probably also not be as optimal as relying on indexed fields from the object e.g. |
What we're discussing here seems to mostly be about NoSQL design and how it's meant to be used to maximise performance, see for example: https://www.mongodb.com/nosql-explained/data-modeling |
It might serve @hardboprobot purpose and make fetching whole tree way easier and efficient, but probably need to simulate that on mongodb. |
Right, we can look at efficient ways of doing this with the current queries and see if it's still not good enough. |
Ah also this page is worth reading I think: https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design |
Thanks for the comments. I wouldn't go as far as altering the models for this specific functionality. IMO the model structures shouldn't be dictated by the API "macro" endpoints. These should be, by design, utility operations based on the same principles of primitive endpoints with certain higher-level logic done by the API. I don't want to over-complicate this and I don't think it's even a priority for now, but what I had in mind was a very simple solution: to encapsulate this kind of logic into the API as a macro endpoint to avoid the overhead of having to issue dozens of queries. No noticeable side-effects, no changes needed in the models nor specific DB design considerations. I also think that it wouldn't take a big effort to implement and it makes sense to try and see if it works as expected. |
Thanks for the context. The reality is that This GitHub issue seems more like a "discussion" actually. Whatever the format, it's good to be going through these things. Discussions are better as they enable replying to particular comments as threads. |
The
Can you elaborate on that? Do you mean improvements on the client side?
Most of the issues I created are meant to start these kind of discussions. From there we can move to concrete and countable actions that we can actually execute. That's why I think it's worth it to consider these high-level topics ASAP. Thank you both for joining the discussions and help move them forward. I really appreciate your patience given the circumstances. |
Yes I mean, even with an optimal API design some client code could still be using it in a very naive way and retrieve nodes one by one. With the current API endpoints, there are simple ways to retrieve more nodes in one go as I mentioned earlier using the kernel revision and node groups for example. Also, it's unclear why retrieving a single node takes that long across the Azure network border, it's extremely fast when run within the same subnet in the cloud or with a local docker-compose instance. I believe there's some sort of throttling in place at the IP level, maybe it's left to some default settings and this could be fine-tuned. It's one of the things stress-testing will help clarify.
Thanks for starting these topics. I think we're just on track and discussing these things now is perfectly what we should be doing during the early access phase. We could have started this phase earlier, but within the current timeline I think we're doing pretty well. |
My understanding is that the current API contains a set of primitive operations that more or less match 1-to-1 to database queries.
Some typical use cases that could be rather frequent, such as the kci show results tool, perform a potentially large number of individual API queries to extract a whole set of data. Accessing this data directly from the database must be almost instantaneous but doing so through primitive API calls takes a non-practical amount of time.
Suggestion: implement an extended set of non-primitive API operations that can do this kind of complex data retrieval in a single query. Any internal query complexities, data post-processing and formatting can be abstracted in the API.
This has two important benefits: 1) it'll make possible to do certain operations that currently are not practical to do due to the large number of queries required, 2) while these operations will be more computationally expensive, they should greatly reduce the query requests processing overhead. The net result should be favorable to the "single-complex-query" approach.
The text was updated successfully, but these errors were encountered: