Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break the main attributegroup request into multiple requests #2385

Open
RFSH opened this issue Jan 5, 2024 · 0 comments
Open

Break the main attributegroup request into multiple requests #2385

RFSH opened this issue Jan 5, 2024 · 0 comments
Assignees
Labels
discussion required requires a discussion before moving forward investigation required Requires some initial investigation

Comments

@RFSH
Copy link
Member

RFSH commented Jan 5, 2024

Summary

Currently the Reference.read function switches to attributegroup to include all the all-outbounds paths as part of the main request. While this works for tables with a small number of all-outbound paths, it doesn't scale and can cause a huge delay in the page's initial load.

In this issue, we will explore breaking this request into multiple requests. This way, we can show the table quickly and fill in the values gradually (similar to how aggregate columns work).

The content of this issue is subject to change, and I just wanted to summarize our initial conversation.

Details

We've already manually done something similar in CFDE and have evidence of its usefulness in large tables with a lot of all-outbound foreign keys.

Pros:

  • We can use the entity API for the main request. So, the initial request is going to be much faster than before.
  • For all outbound requests,
    • We don't need to use outer join anymore.
    • Similar to the request for aggregate columns, we can iterate the shortest key values, increasing the request's performance.

Cons:

  • The biggest issue with this change is that when we introduced wait_for, we didn't ask data modelers to add this for all outbound paths. So, if we switched the behavior to what we described above, we wouldn't have a signal to know which columns rely on the all-outbound requests. This requires more thought and exploration but we might be able to process the markdown usages?
  • As we mentioned with this, the data will be populated in steps, so this UX might actually not be desirable by some.
  • For tables with a small number of all-outbound paths, the existing method of using attributegroup might be faster overall.

Instead of completely replacing the existing behavior, We could offer both options to solve all the "Cons" that I listed above. This could be a table-level and/or catalog-level annotation that data modelers can change, or it could be based on some internal heuristics. So, for example, if the table has fewer than three all-outbound paths, we should use the old method. This requires more thought

@RFSH RFSH added discussion required requires a discussion before moving forward investigation required Requires some initial investigation labels Jan 5, 2024
@RFSH RFSH self-assigned this Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion required requires a discussion before moving forward investigation required Requires some initial investigation
Projects
None yet
Development

No branches or pull requests

1 participant