Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parser): redact SqlOption #14760

Closed
wants to merge 9 commits into from
Closed

feat(parser): redact SqlOption #14760

wants to merge 9 commits into from

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Jan 24, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Implement this idea.

  • Only redact::Secret<String> is supported (used in *Properties), via a wrapper type SecretString.
  • Other redact::Secret<T> are not supported, for the sake of implementation simplicity. Because otherwise redacted string need be able to convert back to the original type.
  • This PR doesn't perform "escape back the unescaped CstyleEscapedString when serializing it", because I think it's non-trivial and always converting to SingleQuotedString is just fine.

#14115

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@zwang28 zwang28 requested a review from BugenZhao January 24, 2024 03:16
@zwang28 zwang28 requested a review from a team as a code owner January 24, 2024 03:16
Comment on lines 80 to 83
#[derive(Clone, Debug, Deserialize, PartialEq, with_options::WithOptions)]
pub struct SecretString {
inner: redact::Secret<String>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about #[serde(transparent)]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redact::Secret doesn't implement Serialize.

Copy link
Contributor Author

@zwang28 zwang28 Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meanwhile impl Serialize for SecretString should be chosen according to use case, i.e. whether the original value or the redacted one is expected.

Though currently there is no other use case that serializes SecretString.

src/connector/src/source/common.rs Outdated Show resolved Hide resolved
src/sqlparser/src/ast/mod.rs Outdated Show resolved Hide resolved
src/frontend/src/handler/util.rs Outdated Show resolved Hide resolved
@zwang28 zwang28 requested a review from BugenZhao January 29, 2024 04:30
@BugenZhao BugenZhao requested a review from xxchan February 29, 2024 06:00
}

fn serialize_str(self, v: &str) -> Result<Self::Ok, Self::Error> {
Ok(Value::SingleQuotedString(v.to_string()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a special character (like \t), I guess we should serialize into a CstyleEscapedString instead. #14193 (comment)

Comment on lines 505 to 508
SqlOption {
name: to_object_name("d"),
value: Value::Null
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we omit the field that's unset? I guess we need an end-to-end test like

SQL 
--parse--> AST
--deser--> Connector Options
--redact--> Redacted Connector Options
--ser--> AST
--unparse--> SQL

to check whether it works well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omitted the field that's unset.
Modified pubsub.slt to cover the code path.

src/sqlparser/src/ast/mod.rs Outdated Show resolved Hide resolved
zwang28 added 2 commits March 5, 2024 10:39
# Conflicts:
#	src/connector/src/source/common.rs
Copy link

gitguardian bot commented Mar 5, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9425213 Triggered Generic Password f2644fd ci/scripts/regress-test.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

Comment on lines +82 to +86
definition: if source.owner == user_id {
source.create_sql()
} else {
source.redacted_create_sql()
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we distinguish owner? I'm not sure whether this is reasonable 🤣

Copy link
Contributor Author

@zwang28 zwang28 Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User should be able to view secrets in definition somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one reason user want redaction is that they want to SHOW CREATE and share it to us for debugging. For this use case, it seems not very convenient if we distinguish by owner.

@BugenZhao comes up with an idea that maybe we can add sth like SHOW REDACTD CREATE. 🤣 (Maybe alternatively we can add a session variable or sth..?)

I'm not sure what users want now. @fuyufjh Do you have any ideas about users' requests? Since I saw you opened #15402

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW on the other hand, force redaction if source.owner != user_id sounds good 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The major requirement is that users are concerning about the secret leaks. Before having a real secret store, I think we should redact for all users.

try_redact_statement(&stmt).map(|stmt| stmt.to_string())
}

fn try_redact_statement(stmt: &Statement) -> Result<Statement, Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to summarize my understanding of this PR:

  • try_redact_statement (Statement -> Statement) is the end goal we want to achieve.
  • The implementation is Statement -> ConnectorProperties -> Statement.
    • The ConnectorProperties -> Statement is achieved by the SqlOptionVecSerializer, and the Serialize impl (which actually does the redaction)

This approach generally sounds good to me. But I have some thoughts:

  • In order to redact Statement -> Statement, an easier approach IMO is just modifying the hashmap directly. We can check the key's name (like having a blacklist for redaction).
  • The redaction needs to be performed manually, so we may still access original SQL in e.g., SQL log. I don't know if this is good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, #14310 looks very like the solution in my mind, but it contains more stuff, including redacting table/column names. Why don't we go with that approach? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to redact Statement -> Statement, an easier approach IMO is just modifying the hashmap directly. We can check the key's name (like having a blacklist for redaction).

Yes, #14310 basically redact all with_options values, i.e. an empty whitelist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upside of this PR is it's strong typed, and secrets in ConnectorProperties won't be printed unexpectedly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secrets in ConnectorProperties won't be printed unexpectedly

I agree this is quite a good change.

However, previously ConnectorProperties doesn't implements Serialize (and Debug) (Actually I removed some unnecessary #[derive(Serialize)]), so probably it won't be printed unexpectedly.

Now we adds #[derive(Serialize)] (although redacted), I feel this isn't very cool..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, previously ConnectorProperties doesn't implements Serialize (and Debug)

It seems I'm wrong. It already implemented Debug. 🤡

@zwang28
Copy link
Contributor Author

zwang28 commented Mar 14, 2024

prefer #14310

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants