Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pool schema/catalog for original values, lazy initialise originalSchema if required #105

Merged
merged 9 commits into from
Nov 7, 2024

Conversation

rbygrave
Copy link
Member

@rbygrave rbygrave commented Oct 30, 2024

  • Make originalSchema and originalCatalog non final and default from the pool configuration
  • Initialise originalSchema and originalCatalog lazily if required and when they are not set from the pool configuration

This improves the performance of initialising connections for the cases where schema and catalog are not changed, by skipping the reading of the schema and catalog from the connection [which is frequently a query/network call].

@rPraml, can I get you to review this.

Note: The currentSchema is null when using the originalSchema, I think this is ok (with key for cached PreparedStatements etc).

Copy link
Contributor

@rPraml rPraml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Rob,

I think there could be a problematic edge case, when the default schema or catalog is null. Please check my comments.

The rest looks good to me.

Roland

/edit:

Note: The currentSchema is null when using the originalSchema, I think this is ok (with key for cached PreparedStatements etc).

Important information: At least for SqlServer (maybe others too), you must not use a prepared statement accross schemas / catalogs. (Therefore, the schema and catalaog must be taken into account in the cache key, which was already fixed in last PR)

Copy link
Contributor

@rPraml rPraml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important information: At least for SqlServer (maybe others too), you must not use a prepared statement accross schemas / catalogs. (Therefore, the schema and catalaog must be taken into account in the cache key, which was already fixed in last PR)

sorry I didn't think of that yesterday

String key = sql + ':' + currentSchema + ':' + currentCatalog + ':' + returnKeysFlag;

I think, that we can get a (theoretical?) problem here.

Imagine the connection is "fresh", and the connection-string defines something like schema=S1

So the current schema of the connection would be S1, but currentSchema is still null. So we prepare a statement for S1, but the cache key contains null

If we then invoke setSchema(null) the connection is in schema-less state. Preparing the same statement would retrieve the one from above, which is prepared for S1

I know this is probably (!) a case that will not arise in practice, but we should still think about it.

What do you think, if we use an int schemaState and int catalogState instead of booleans?

  • 0 = UNKNOWN - so we do not know the current schema and catalog. We do not need to add it in cache key computation (additional performance benefit) and we do not need to reset ist.
  • 1 = CHANGED - when setSchema/setCatalog is invoked and we come from UNKNOWN state, we read the state once, call setSchema/setCatalog on the delegate and go to CHANGED state.
  • 2 = KNOWN - when we close the connection and come from the CHANGED state, we reset it and go to KNOWN state. From then on we only switch between KNOWN and CHANGED

It might happen, that we might cache the same statement for UNKNONW and KNOWN state twice (or even triple, when you call setSchema("S1") and connection is already in that state) - but then, there is no risk, as described above. (And I prefer to waste some cache memory rather than mixing the schemes)

Don't get me wrong, I think that there is no practical use case (at the moment) and the code will work as it is. But I believe it is our mutual quality standard to be 100.0% correct here 😉

rbygrave and others added 3 commits November 1, 2024 16:11
… null).

The problematic case that this change addresses is:

Imagine the connection is "fresh", and the connection-string defines something like schema=S1

So the current schema of the connection would be S1, but currentSchema is still null. So we prepare a statement for S1, but the cache key contains null.

If we then invoke setSchema(null) the connection is in schema-less state. Preparing the same statement would retrieve the one from above, which is prepared for S1.

---
The FIX for this case, use "@default" instead of null, such that the cache key isn't problematic for the above case.
@rPraml
Copy link
Contributor

rPraml commented Nov 5, 2024

Hello @rbygrave
I've reviewed this PR. See #106

Problems I've identified:

It seems that DataSourceConfig.catalog() is never applied. Please check #106 (comment)

I think, we must not reset the currentSchema to null, as it may be used as cache key then... Please check #106 (comment)

These two should be fixed here.

Interesting thing:

setSchema/getSchema is not always symmetric.
#106 (comment)
(So we cannot use asserts)

Play it safe

Do you think it is a good idea, to enforce schema/catalog, when taking a connection from the pool?
#106 (comment)
I know, this may send 1-2 commands to db, which increases latency

The resetForUse() is called "on borrow" just before the connection
is returned to the application. As such we can't do anything that
could throw any Exception here. All connection reset actions must
be performed when the connection is returned to the pool.
@rbygrave
Copy link
Member Author

rbygrave commented Nov 5, 2024

Do you think it is a good idea, to enforce schema/catalog, when taking a connection from the pool?

No I don't think we should rely on performing those tasks at "connection borrow time" just before the connection is returned to the application. This is because when anything fails here it's problematic. Instead, the design is that all "reset tasks" are performed when the connection is returned and if for any reason that fails then the connection doesn't actually make it back into the pool [its "force closed" and not added back into the pool].

Copy link
Contributor

@rPraml rPraml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, but maybe you can change the order of restoring schema/catalog in close() to be on the safe side.

Thank you for your work.

@@ -388,6 +398,9 @@ private Connection initConnection(Connection conn) throws SQLException {
if (readOnly) {
conn.setReadOnly(true);
}
if (catalog != null) {
conn.setCatalog(catalog);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every day I think of something new. 🤣 Do you think the order is important? I could imagine that you have to change the catalog first before you can change the schema, because the schema does not exist in the current catalog, for example. We did it right here. But not in close(). From my gut feeling, I would say change the catalog first, then the schema

schemaState = SCHEMA_CATALOG_KNOWN;
this.cacheKeySchema = originalSchema;
// if schema & catalog is defined, we can be sure, that connection is initialized properly
assert originalSchema.equals(connection.getSchema()) : "Connection is in the wrong schema: " + connection.getSchema() + ", expected: " + originalSchema;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about these asserts? On the one hand, I think they are good because you can see, for example, if you specify a non-existent schema in the PoolConfiguration (at least pg returns null in that case). On the other hand, they could cause problems if the schema is returned in uppercase, for example.
They definitely help to find potential programming errors.

I would keep them in until someone provides a use case. (and the user can disable assertions anyway)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about these asserts?

I've honestly changed my mind a couple of times on these asserts. On one hand they look useful, but on the other they can throw SQLException and fail on case but only with asserts enabled, and if the driver allows setting unknown schema etc maybe its best to just allow that.

So I'm thinking of removing them. That really means that we are leaving it up to developers to set original schema and catalog appropriately and to test that themselves as needed.

}

if (resetCatalog) {
if (catalogState == SCHEMA_CATALOG_CHANGED) {
connection.setCatalog(originalCatalog);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: We should switch the order here. First restore catalog, then restore schema. See other comment

By the way, I can imagine an edge case here again:

Postgres fails silently, if I call setSchema("doesnotexist") and getSchema() returns null

So if you set up your pool, that

  • has originalSchema = "TENANT_MASTER"
  • and a thread takes a connection from the pool and changes the schema to "TENANT_123"
  • the thread does some long running tasks...
  • in the meantime, the DBA deletes "TENANT_MASTER" (yes, a stupid idea)
  • the thread above puts back the connection to the pool
  • we try to switch back to originalSchema, but PG does not fail in this case
  • in what state is that connection? 😉
    • Still in "TEANT_123"
    • or in no man's land? (at least getSchema() would return null for Postgres)
    • is there a danger, that the next one, that pulls the connection will start in "TENANT_123"

I think we would be overdoing it here if we also covered this special case, because it is really a stupid idea to delete the main schema

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First restore catalog, then restore schema

Yes, I'm good with that. I'll make that change.

I think we would be overdoing it here if we also covered this special case

I agree. We can leave that sort of case with developers to own.

@rbygrave rbygrave added this to the 9.2 milestone Nov 7, 2024
@rbygrave rbygrave merged commit 825b0bc into master Nov 7, 2024
2 checks passed
@rbygrave rbygrave deleted the feature/change-schema-reset branch November 7, 2024 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants