Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error control when indexing Non-Deterministic Source Queries #479

Open
osopardo1 opened this issue Nov 20, 2024 · 2 comments
Open

Error control when indexing Non-Deterministic Source Queries #479

osopardo1 opened this issue Nov 20, 2024 · 2 comments
Assignees

Comments

@osopardo1
Copy link
Member

osopardo1 commented Nov 20, 2024

As a first solution for #466, we need to force users to add the columnStats when indexing Tables with the following characteristics:

  • Underlying data source changes constantly.
  • DataFrame contains non-deterministic columns to index.
  • DataFrame contains non-deterministic predicates.

The usage of columnStats would infer the data's min/max values before the DataFrame Analysis, which can produce inconsistent results when loading the DataFrame twice for Indexing in any of the above use cases.

The idea is to enforce the user to explicit the columnStats when the query source is non-deterministic.

@osopardo1 osopardo1 self-assigned this Nov 20, 2024
@osopardo1
Copy link
Member Author

Before: Analyze to what extent is possible to know the determinism of a column/query in advance.

@osopardo1 osopardo1 changed the title Error control + enforce columnStats when indexing non-deterministic or source-changing DataFrames Error control when indexing non-deterministic or source-changing DataFrames Dec 4, 2024
@osopardo1 osopardo1 changed the title Error control when indexing non-deterministic or source-changing DataFrames Error control when indexing Non-Deterministic Source Queries Dec 10, 2024
@osopardo1
Copy link
Member Author

Update the title of the issue to treat here only the Non-Deterministic case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant