-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pool schema/catalog for original values, lazy initialise originalSchema if required #105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Rob,
I think there could be a problematic edge case, when the default schema or catalog is null. Please check my comments.
The rest looks good to me.
Roland
/edit:
Note: The currentSchema is null when using the originalSchema, I think this is ok (with key for cached PreparedStatements etc).
Important information: At least for SqlServer (maybe others too), you must not use a prepared statement accross schemas / catalogs. (Therefore, the schema and catalaog must be taken into account in the cache key, which was already fixed in last PR)
ebean-datasource/src/main/java/io/ebean/datasource/pool/PooledConnection.java
Outdated
Show resolved
Hide resolved
ebean-datasource/src/main/java/io/ebean/datasource/pool/PooledConnection.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important information: At least for SqlServer (maybe others too), you must not use a prepared statement accross schemas / catalogs. (Therefore, the schema and catalaog must be taken into account in the cache key, which was already fixed in last PR)
sorry I didn't think of that yesterday
String key = sql + ':' + currentSchema + ':' + currentCatalog + ':' + returnKeysFlag;
I think, that we can get a (theoretical?) problem here.
Imagine the connection is "fresh", and the connection-string defines something like schema=S1
So the current schema of the connection would be S1
, but currentSchema
is still null. So we prepare a statement for S1
, but the cache key contains null
If we then invoke setSchema(null)
the connection is in schema-less state. Preparing the same statement would retrieve the one from above, which is prepared for S1
I know this is probably (!) a case that will not arise in practice, but we should still think about it.
What do you think, if we use an int schemaState
and int catalogState
instead of booleans?
- 0 = UNKNOWN - so we do not know the current schema and catalog. We do not need to add it in cache key computation (additional performance benefit) and we do not need to reset ist.
- 1 = CHANGED - when setSchema/setCatalog is invoked and we come from UNKNOWN state, we read the state once, call setSchema/setCatalog on the delegate and go to CHANGED state.
- 2 = KNOWN - when we close the connection and come from the CHANGED state, we reset it and go to KNOWN state. From then on we only switch between KNOWN and CHANGED
It might happen, that we might cache the same statement for UNKNONW and KNOWN state twice (or even triple, when you call setSchema("S1")
and connection is already in that state) - but then, there is no risk, as described above. (And I prefer to waste some cache memory rather than mixing the schemes)
Don't get me wrong, I think that there is no practical use case (at the moment) and the code will work as it is. But I believe it is our mutual quality standard to be 100.0% correct here 😉
ebean-datasource/src/main/java/io/ebean/datasource/pool/PooledConnection.java
Outdated
Show resolved
Hide resolved
… null). The problematic case that this change addresses is: Imagine the connection is "fresh", and the connection-string defines something like schema=S1 So the current schema of the connection would be S1, but currentSchema is still null. So we prepare a statement for S1, but the cache key contains null. If we then invoke setSchema(null) the connection is in schema-less state. Preparing the same statement would retrieve the one from above, which is prepared for S1. --- The FIX for this case, use "@default" instead of null, such that the cache key isn't problematic for the above case.
Hello @rbygrave Problems I've identified:It seems that I think, we must not reset the currentSchema to These two should be fixed here. Interesting thing:setSchema/getSchema is not always symmetric. Play it safeDo you think it is a good idea, to enforce schema/catalog, when taking a connection from the pool? |
Feature/change schema reset review
The resetForUse() is called "on borrow" just before the connection is returned to the application. As such we can't do anything that could throw any Exception here. All connection reset actions must be performed when the connection is returned to the pool.
No I don't think we should rely on performing those tasks at "connection borrow time" just before the connection is returned to the application. This is because when anything fails here it's problematic. Instead, the design is that all "reset tasks" are performed when the connection is returned and if for any reason that fails then the connection doesn't actually make it back into the pool [its "force closed" and not added back into the pool]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now, but maybe you can change the order of restoring schema/catalog in close()
to be on the safe side.
Thank you for your work.
@@ -388,6 +398,9 @@ private Connection initConnection(Connection conn) throws SQLException { | |||
if (readOnly) { | |||
conn.setReadOnly(true); | |||
} | |||
if (catalog != null) { | |||
conn.setCatalog(catalog); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every day I think of something new. 🤣 Do you think the order is important? I could imagine that you have to change the catalog first before you can change the schema, because the schema does not exist in the current catalog, for example. We did it right here. But not in close()
. From my gut feeling, I would say change the catalog first, then the schema
schemaState = SCHEMA_CATALOG_KNOWN; | ||
this.cacheKeySchema = originalSchema; | ||
// if schema & catalog is defined, we can be sure, that connection is initialized properly | ||
assert originalSchema.equals(connection.getSchema()) : "Connection is in the wrong schema: " + connection.getSchema() + ", expected: " + originalSchema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about these asserts? On the one hand, I think they are good because you can see, for example, if you specify a non-existent schema in the PoolConfiguration (at least pg returns null in that case). On the other hand, they could cause problems if the schema is returned in uppercase, for example.
They definitely help to find potential programming errors.
I would keep them in until someone provides a use case. (and the user can disable assertions anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about these asserts?
I've honestly changed my mind a couple of times on these asserts. On one hand they look useful, but on the other they can throw SQLException and fail on case but only with asserts enabled, and if the driver allows setting unknown schema etc maybe its best to just allow that.
So I'm thinking of removing them. That really means that we are leaving it up to developers to set original schema and catalog appropriately and to test that themselves as needed.
} | ||
|
||
if (resetCatalog) { | ||
if (catalogState == SCHEMA_CATALOG_CHANGED) { | ||
connection.setCatalog(originalCatalog); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: We should switch the order here. First restore catalog, then restore schema. See other comment
By the way, I can imagine an edge case here again:
Postgres fails silently, if I call setSchema("doesnotexist")
and getSchema()
returns null
So if you set up your pool, that
- has originalSchema = "TENANT_MASTER"
- and a thread takes a connection from the pool and changes the schema to "TENANT_123"
- the thread does some long running tasks...
- in the meantime, the DBA deletes "TENANT_MASTER" (yes, a stupid idea)
- the thread above puts back the connection to the pool
- we try to switch back to originalSchema, but PG does not fail in this case
- in what state is that connection? 😉
- Still in "TEANT_123"
- or in no man's land? (at least
getSchema()
would return null for Postgres) - is there a danger, that the next one, that pulls the connection will start in "TENANT_123"
I think we would be overdoing it here if we also covered this special case, because it is really a stupid idea to delete the main schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First restore catalog, then restore schema
Yes, I'm good with that. I'll make that change.
I think we would be overdoing it here if we also covered this special case
I agree. We can leave that sort of case with developers to own.
This improves the performance of initialising connections for the cases where schema and catalog are not changed, by skipping the reading of the schema and catalog from the connection [which is frequently a query/network call].
@rPraml, can I get you to review this.
Note: The currentSchema is null when using the originalSchema, I think this is ok (with key for cached PreparedStatements etc).