Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move CDFQuantiles stats computation to the Data Analyzer #422

Open
osopardo1 opened this issue Sep 18, 2024 · 0 comments
Open

Move CDFQuantiles stats computation to the Data Analyzer #422

osopardo1 opened this issue Sep 18, 2024 · 0 comments
Labels
type: enhancement Improvement of existing feature or code

Comments

@osopardo1
Copy link
Member

Following with #416, we should add the CDF Quantiles computation to the Data Analyzer instead of computing it on the external API.

Right now, we are using the QbeastUtils interface to calculate the String and the Numeric bins for a specific column, and then we need to use those bins to configure the transformation.

val idStats = QbeastUtils.computeQuantilesForColumn("id", df)
df.write.format("qbeast").option("columnsToIndex", "id").option("columnStats","""{id_quantiles:$idStats}""").save(...)

Otherwise, the write would fail.

We should change to avoid using the QbeastUtils methods and just execute.

df.write.format("qbeast").option("columnsToIndex", "id").save(...)

As a first step, the Data Analyzer should not constantly compute the new stats for Quantiles. If we want to trigger a new Revision, we would still need to do it manually.

We need to design and understand the effect of changes in the data distribution and which is the criteria to know the if the stats had diverged enough from the original conf.

@osopardo1 osopardo1 added the type: enhancement Improvement of existing feature or code label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Improvement of existing feature or code
Projects
None yet
Development

No branches or pull requests

1 participant