Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable Disk Caching for Multiple Sources (Supersedes PR#30) #148

Closed
wants to merge 1 commit into from

Conversation

shamb0
Copy link

@shamb0 shamb0 commented Oct 8, 2024

This PR is part of a pair; please review, validate, and consider merging both.

paradedb/paradedb#1751
#148

Scope of this PR:

This PR supersedes the following previous submissions:

It is based on consolidated requirements and feedback from the review comments on the above closed PRs.

What

  • This PR separates the core implementation from PR#30, focusing on enabling disk caching for various sources supported by pg_analytics.
  • The benchmarking, specifically for Hive-style partitioned Parquet sources, leverages this core disk cache functionality and is implemented in paradedb/cargo-paradedb/src/pga_benches/pga_benchlogs_hsp_pq.rs.

Why

How

@@ -51,7 +51,7 @@ impl OptionValidator for JsonOption {
}
}

pub fn create_view(
pub fn create_duckdb_relation(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you renaming these functions? This again is making the diff much larger and harder to review. You're not even changing the code of the function, so why does it need a new name?

Comment on lines +279 to +315
pub fn create_duckdb_relation(
table_name: &str,
schema_name: &str,
table_options: HashMap<String, String>,
handler: FdwHandler,
) -> Result<()> {
connection::execute(
format!("CREATE SCHEMA IF NOT EXISTS {schema_name}").as_str(),
[],
)?;

match handler {
FdwHandler::Csv => {
connection::create_csv_relation(table_name, schema_name, table_options)?;
}
FdwHandler::Delta => {
connection::create_delta_relation(table_name, schema_name, table_options)?;
}
FdwHandler::Iceberg => {
connection::create_iceberg_relation(table_name, schema_name, table_options)?;
}
FdwHandler::Parquet => {
connection::create_parquet_relation(table_name, schema_name, table_options)?;
}
FdwHandler::Spatial => {
connection::create_spatial_relation(table_name, schema_name, table_options)?;
}
FdwHandler::Json => {
connection::create_json_relation(table_name, schema_name, table_options)?;
}
_ => {
bail!("got unexpected fdw_handler")
}
};

Ok(())
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think this is a better way to handle creating relations, could you please pull this out into a separate PR and submit it independently? This seems to me irrelevant to the feature at stakes here, which is making things harder to review

Comment on lines -215 to -257
pub fn register_duckdb_view(
table_name: &str,
schema_name: &str,
table_options: HashMap<String, String>,
user_mapping_options: HashMap<String, String>,
handler: FdwHandler,
) -> Result<()> {
if !user_mapping_options.is_empty() {
connection::create_secret(DEFAULT_SECRET, user_mapping_options)?;
}

if !connection::view_exists(table_name, schema_name)? {
// Initialize DuckDB view
connection::execute(
format!("CREATE SCHEMA IF NOT EXISTS {schema_name}").as_str(),
[],
)?;

match handler {
FdwHandler::Csv => {
connection::create_csv_view(table_name, schema_name, table_options)?;
}
FdwHandler::Delta => {
connection::create_delta_view(table_name, schema_name, table_options)?;
}
FdwHandler::Iceberg => {
connection::create_iceberg_view(table_name, schema_name, table_options)?;
}
FdwHandler::Parquet => {
connection::create_parquet_view(table_name, schema_name, table_options)?;
}
FdwHandler::Spatial => {
connection::create_spatial_view(table_name, schema_name, table_options)?;
}
FdwHandler::Json => {
connection::create_json_view(table_name, schema_name, table_options)?;
}
_ => {
bail!("got unexpected fdw_handler")
}
};
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this? It doesn't seem related to your new feature

Copy link
Collaborator

@philippemnoel philippemnoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of code being moved around but seemingly not changed. Could you please share the rational why?

If you need to restructure things to enable your feature, which I can't quite tell at first glance, please submit a PR that this depends on. The diff on this PR is 800 lines yet I suspect you're only adding a much smaller fraction of new lines of code.

Please rescope this PR so it's reviewable.

@philippemnoel philippemnoel marked this pull request as draft October 9, 2024 21:12
@shamb0
Copy link
Author

shamb0 commented Oct 10, 2024

Hi @philippemnoel,

Thank you for the review comments, I really appreciate it. Moving forward, I'll aim for smaller PRs with minimal changes to make them easier to review.

@shamb0 shamb0 closed this Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants