Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding non-primitive API operations #376

Open
r-c-n opened this issue Oct 5, 2023 · 16 comments
Open

Consider adding non-primitive API operations #376

r-c-n opened this issue Oct 5, 2023 · 16 comments

Comments

@r-c-n
Copy link
Contributor

r-c-n commented Oct 5, 2023

My understanding is that the current API contains a set of primitive operations that more or less match 1-to-1 to database queries.

Some typical use cases that could be rather frequent, such as the kci show results tool, perform a potentially large number of individual API queries to extract a whole set of data. Accessing this data directly from the database must be almost instantaneous but doing so through primitive API calls takes a non-practical amount of time.

Suggestion: implement an extended set of non-primitive API operations that can do this kind of complex data retrieval in a single query. Any internal query complexities, data post-processing and formatting can be abstracted in the API.
This has two important benefits: 1) it'll make possible to do certain operations that currently are not practical to do due to the large number of queries required, 2) while these operations will be more computationally expensive, they should greatly reduce the query requests processing overhead. The net result should be favorable to the "single-complex-query" approach.

@gctucker
Copy link
Contributor

gctucker commented Oct 5, 2023

Yes that's part of the goals, but we can't really know what these would look like in practice until we have well defined use-cases. I can see we will most likely need specialised API endpoints for a web dashboard use-case for example. The only "macro" query we have now is when posting a hierarchy of nodes, it will add them as a batch.

I'm not sure what we can do this with particular GitHub issue to be honest, but we will definitely need to create specific ones for use-cases as we identify them. The first goals I believe are to ensure we're feature-complete for a stable release, then performance optimisation can happen on top of that to make the same features work "faster".

In addition to this, we'll need to investigate the optimal database indexes, and generally speaking do some stress-testing with the AKS deployment to identify any performance bottlenecks in the entire system.

@r-c-n
Copy link
Contributor Author

r-c-n commented Oct 5, 2023

Yes that's part of the goals, but we can't really know what these would look like in practice until we have well defined use-cases

That's ok, I can begin to define some use cases and use them to prototype an initial proof of concept. The case of "give me the whole node tree hanging from this one" is a pretty evident one that could be useful.

I'm not sure what we can do this with particular GitHub issue to be honest, but we will definitely need to create specific ones for use-cases as we identify them

No problem, I'll create more speficic issues. The purpose of this one was to discuss about the general idea and to have a potential feature to put somewhere in the roadmap.

In addition to this, we'll need to investigate the optimal database indexes, and generally speaking do some stress-testing with the AKS deployment to identify any performance bottlenecks in the entire system.

I agree, but all of that should be use-case-driven. So I think we should start by defining the operations and then the necessary DB and implementation changes needed to accommodate them.

@gctucker
Copy link
Contributor

gctucker commented Oct 5, 2023

That's ok, I can begin to define some use cases and use them to prototype an initial proof of concept. The case of "give me the whole node tree hanging from this one" is a pretty evident one that could be useful.

I believe we can already do this with the existing queries actually using common attributes for the whole set of nodes (e.g. same kernel revision or job/group name). And using recursive ID look-ups is not going to play well with MongoDB, it's not designed for this kind of thing.

@gctucker
Copy link
Contributor

gctucker commented Oct 5, 2023

I agree, but all of that should be use-case-driven. So I think we should start by defining the operations and then the necessary DB and implementation changes needed to accommodate them.

While I think extending the API for performance with macro operations is driven by use cases (like in my first comment), stress-testing the system can already be done with the primitive operations we have now.

Also, it's a big advantage to not have many API endpoints. So adding macro operations on the API side should be carefully thought through to compare the actual added value with the costs of making the API more complex.

@nuclearcat
Copy link
Member

What if we keep in all child nodes value similar to path, but with parent node ids?
So when child node created, it will inherit parent "nodepath" plus add parent id. (Or "parentpath")

node:"aaaa"
nodepath:[]

node:"bbbb"
nodepath:["aaaa"]

node:"cccc"
nodepath:["aaaa","bbbb"]

@gctucker
Copy link
Contributor

gctucker commented Oct 5, 2023

That's already the case.

@nuclearcat
Copy link
Member

That's already the case.

planned to be implemented or implemented?
Because right now as i see only path exist, but it contains non-unique path - with just name of parent jobs, not id.
Parent id also exist, but it contains only one level parent, which will require many queries to fetch whole tree.

@gctucker
Copy link
Contributor

gctucker commented Oct 5, 2023

@nuclearcat Oh sorry I see what you mean - currently the path uses the node names, but I think you're suggesting to also have that with node object IDs as they're unique? Yes I guess that could work, but it would probably also not be as optimal as relying on indexed fields from the object e.g. revision.commit and group.

@gctucker
Copy link
Contributor

gctucker commented Oct 5, 2023

What we're discussing here seems to mostly be about NoSQL design and how it's meant to be used to maximise performance, see for example: https://www.mongodb.com/nosql-explained/data-modeling

@nuclearcat
Copy link
Member

It might serve @hardboprobot purpose and make fetching whole tree way easier and efficient, but probably need to simulate that on mongodb.
For example we want to fetch only arm64-x86-chromebook build with tests and results which have build id "bbbb" (checkout is "aaaa").
So on all nodes after build and build itself it will be always ["aaaa", "bbbb" (and more if it is under build)]
Then we need to search all documents hat have nodepath[1]="bbbb". (not sure if mongodb have such option and if it is efficient).

@gctucker
Copy link
Contributor

gctucker commented Oct 5, 2023

Right, we can look at efficient ways of doing this with the current queries and see if it's still not good enough.

@gctucker
Copy link
Contributor

gctucker commented Oct 5, 2023

Ah also this page is worth reading I think: https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design

@r-c-n
Copy link
Contributor Author

r-c-n commented Oct 5, 2023

Thanks for the comments. I wouldn't go as far as altering the models for this specific functionality.

IMO the model structures shouldn't be dictated by the API "macro" endpoints. These should be, by design, utility operations based on the same principles of primitive endpoints with certain higher-level logic done by the API.

I don't want to over-complicate this and I don't think it's even a priority for now, but what I had in mind was a very simple solution: to encapsulate this kind of logic into the API as a macro endpoint to avoid the overhead of having to issue dozens of queries. No noticeable side-effects, no changes needed in the models nor specific DB design considerations. I also think that it wouldn't take a big effort to implement and it makes sense to try and see if it works as expected.

@gctucker
Copy link
Contributor

gctucker commented Oct 5, 2023

Thanks for the context. The reality is that kci show is probably using the API in the worst possible way, the only reason why it's like this is because it was the simplest way to write it. With some minor improvements we could reduce the number of queries by a factor of maybe 100. That's also part of the plan following the kci command line tool rework.

This GitHub issue seems more like a "discussion" actually. Whatever the format, it's good to be going through these things. Discussions are better as they enable replying to particular comments as threads.

@r-c-n
Copy link
Contributor Author

r-c-n commented Oct 5, 2023

The kci show use case is an example but probably not the only case where it might be reasonable to extend the API to make the client lives easier.

With some minor improvements we could reduce the number of queries by a factor of maybe 100. That's also part of the plan following the kci command line tool rework.

Can you elaborate on that? Do you mean improvements on the client side?

This GitHub issue seems more like a "discussion" actually

Most of the issues I created are meant to start these kind of discussions. From there we can move to concrete and countable actions that we can actually execute. That's why I think it's worth it to consider these high-level topics ASAP. Thank you both for joining the discussions and help move them forward. I really appreciate your patience given the circumstances.

@gctucker
Copy link
Contributor

gctucker commented Oct 5, 2023

Yes I mean, even with an optimal API design some client code could still be using it in a very naive way and retrieve nodes one by one. With the current API endpoints, there are simple ways to retrieve more nodes in one go as I mentioned earlier using the kernel revision and node groups for example. Also, it's unclear why retrieving a single node takes that long across the Azure network border, it's extremely fast when run within the same subnet in the cloud or with a local docker-compose instance. I believe there's some sort of throttling in place at the IP level, maybe it's left to some default settings and this could be fine-tuned. It's one of the things stress-testing will help clarify.

Thank you both for joining the discussions and help move them forward. I really appreciate your patience given the circumstances.

Thanks for starting these topics. I think we're just on track and discussing these things now is perfectly what we should be doing during the early access phase. We could have started this phase earlier, but within the current timeline I think we're doing pretty well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants