Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5760][#5780] fix(catalog): fix drop catalog error #5761

Merged
merged 4 commits into from
Dec 16, 2024

Conversation

mchades
Copy link
Contributor

@mchades mchades commented Dec 4, 2024

What changes were proposed in this pull request?

before drop the catalog, check all schemas are avaliable

Why are the changes needed?

some schemas are dropped externally, but still exist in the entity store, those schemas are invalid

Fix: #5760
Fix: #5780

Does this PR introduce any user-facing change?

no

How was this patch tested?

CI pass

@mchades mchades requested review from jerryshao and FANNG1 December 4, 2024 14:09
@mchades mchades self-assigned this Dec 4, 2024
}

private void createSchema() {
private void createSchema(Catalog catalog, String schemaName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to pass Catalog here.
createSchema() is a private function, and it can access catalog anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the method also used to create a schema under a specific catalog, see the 267 line in testDropCatalog()

throw new RuntimeException(e);
}
}

private boolean haveAvailableSchemas(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private boolean haveAvailableSchemas(
private boolean hasSchemas(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about hasAvailableSchemas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has <=> available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that the "default" schema is not an available schema in this method

return false;
}

if (schemaEntities.size() == 1 && catalogEntity.getProvider().equals("kafka")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be optimized, IMHO.

If kafka is special for a reason, e.g. kafka is evil, you may want to add a function
to the class isProviderEvil(). I mean, you may want to consolidate this kind of tests
based on the business logic and maintain that logic, rather than treating kafka as
a special animal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only the Kafka catalog has the feature of having only one schema (named default). Therefore, I think it may not be a compelling enough reason to refactor at the API level.

But extra a method such as inRawStatus(CatalogEntity, List<SchemaEntity>) sounds a good idea, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so ... do we have that 'default' schema recorded as we do for other catalogs?
We can remove this if () condition if so.
In theory, the default schema is still a schema. Returning false is not a good idea. And this shortcut complicates the code.

I was striving to reduce hard-coded specializations in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so ... do we have that 'default' schema recorded as we do for other catalogs?
We can remove this if () condition if so.
In theory, the default schema is still a schema. Returning false is not a good idea. And this shortcut complicates the code.

this function is hasAvailableSchemas not hasSchemas, the "default" schema of Kafka catalog in this function is not treated as an available schema.

I was striving to reduce hard-coded specializations in the code.

The Catalog API is user-oriented. If we were to add a method like isProviderEvil() to the API, it might confuse users and require additional explanation. It would be more costly than adding an extra method such as inRawStatus(CatalogEntity, List<SchemaEntity>) in this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are still unable to reach a consensus, I would like to hear the opinions of others. WDYT? @jerryshao @yuqi1129

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I will also add a condition here catalogEntity.getProvider().equals("jdbc-postgresql") && schemaEntities.size() == 1 && "public".equals(schemaEntities.get(0)) to resolve issue #5780

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have a better name for this method and add more comments to explain the logic, in the meantime we can split out the special Kafka logic to make it more understandable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me add more comments on this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the method, plz help to review it again when you have time, thx! @tengqm @jerryshao

@mchades mchades changed the title [#5760] fix(catalog): fix drop catalog error [#5760][#5780] fix(catalog): fix drop catalog error Dec 9, 2024
* otherwise.
* @throws Exception If an error occurs while checking the schemas.
*/
private boolean containsUserCreatedSchemas(
Copy link
Contributor

@jerryshao jerryshao Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about change to containsExternalSchemas or containsSchemasFromExternalCatalog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these options are not entirely accurate. Since the "public" schema in the PG catalog is an external schema (not created by Gravitino), the correct return here should be false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my feeling, how do we define "UserCreated..."? Both schemas created w/ or w/o Gravitino can be called "user created...", it is hard to understand from the method name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically all schemas are created by users, except:

  • "public" schema in PG is created by PG itself, not a user-created schema
  • "default" schema in kafka catalog is created by Kafka catalog itself, not a user-created schema

This is unrelated to whether it was created by Gravitino, but rather pertains to whether it was created by a user.

Comment on lines 681 to 683
} catch (GravitinoRuntimeException e) {
throw e;

} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we treat them differently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if we wrap the GravitinoRuntimeException to RuntimeException , the error response will be aways INTERNAL_SERVER_ERROR, the CatalogExceptionHandler can not convert it to correspond error response

* otherwise.
* @throws Exception If an error occurs while checking the schemas.
*/
private boolean containsUserCreatedSchemas(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my feeling, how do we define "UserCreated..."? Both schemas created w/ or w/o Gravitino can be called "user created...", it is hard to understand from the method name.

@jerryshao jerryshao merged commit fce1bd9 into apache:main Dec 16, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants