-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expose all tables via flight sql #571
Conversation
9e30ee6
to
3deb641
Compare
Going to wait to review until CI is green |
@stbrody Ready for review now. Turns out there was a subtle issue with the federation used with the |
2f6e6c6
to
2cf7a61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few small comments but generally LGTM.
One question: where does this leave folks who want to build their own custom aggregators? Can they still do so out-of-process by consuming the conclusion feed over the network? Does this PR make this more difficult?
requires = "flight_sql_bind_address", | ||
env = "CERAMIC_ONE_AGGREGATOR" | ||
)] | ||
aggregator: Option<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think if this should be a state enum instead of a boolean, but I'm having a hard time thinking about what other modes the aggregator might have in the future. I guess since I can't think of anything than just a bool is probably fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is also to remove this boolean once we make it always on. We will likely need some form of config for which parts of the pipeline to enable. That will likely be a net new config option once we have move than one stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the name of the flag be enable_aggregator
maybe?
#[arg( | ||
long, | ||
requires = "experimental_features", | ||
env = "CERAMIC_ONE_S3_BUCKET" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the future the data files might live in a different bucket than the aggregator output right? So we might want this name to indicate that this specifically for the result of the aggregated state that we are exporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am anticipating a single bucket for all the data. A single bucket and handle many tables. I don't want to complicate the API by expecting a different bucket for each table until that need arises in product discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the comment to reflect this intention
one/src/daemon.rs
Outdated
aggregator: Option<bool>, | ||
|
||
/// S3 bucket name. | ||
/// When configured the aggregator will support storing data in S3 compatible object stores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will support? or will actually do so?
one/src/daemon.rs
Outdated
@@ -469,7 +497,7 @@ pub async fn run(opts: DaemonOpts) -> Result<()> { | |||
.await | |||
.map_err(|e| { | |||
anyhow!( | |||
"Failed to start libp2p server using addresses: {}. {}", | |||
"Failed to start linebp2p server using addresses: {}. {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
/// Enable the aggregator, requires Flight SQL to be enabled | ||
#[arg( | ||
long, | ||
requires = "flight_sql_bind_address", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also requires the s3 bucket?
let _ = shutdown_signal.recv().await; | ||
let bucket = opts | ||
.s3_bucket | ||
.ok_or_else(|| anyhow!("s3_bucket option is required when exposing flight sql"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be required? What if you just want the run the aggregation locally and query out the aggregated state feed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today its required as there is no mechanism for storing the state locally. We should likely add that for dev purposes but that is a separate issue.
This does not make it more difficult. Consumers can still build their own aggregators out of band. In fact this makes is easier as all the data for any stage in the pipeline can be found in a single location. |
This change moves the OLAP aggregation logic within ceramic-one. This means that all tables are available via Flight SQL endpoint.
The REPL query command using federation had many gotchas that were hard to predict. For example using a count query would confuse the federation optimizer rules. This changes the query command to be a plain (non REPL) client of the FlightSQL endpoint.
6b18092
to
e90b766
Compare
This does remove all the tooling for building an external validator though right? Now someone would need to not just make their specific aggregator, but also a build process for a standalone aggregator binary |
That was true before however they could have copied the olap binary. But it was quite small and didn't have anything interesting to preserve. It was a blind copy of the ceramic-one binary setup. |
This change moves the OLAP aggregation logic within ceramic-one. This means that all tables are available via Flight SQL endpoint.
TODO: