-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JT400 vs IBM i Access ODBC Driver -- Major Performance Issues #110
Comments
One gets paid software support by paying the software support for their IBM i license. |
As @JMarkMurphy said, if you have active SWMA for the IBM i OS, you can open a support ticket for ODBC and/or JDBC drivers. |
Thanks @JMarkMurphy and @kadler ... It's taking a bit to navigate through IBM's support, but we're getting that journey started. For reference, here is a visual example demonstrating the speed difference between the 2 drivers (ODBC vs JDBC). For the same query, it's taking ~1 second via ODBC and ~120 seconds via JDBC! Note that what we really need is a fast JDBC driver to use in a Databricks environment to load data from DB2 for reporting. Though this is an example from Windows, it exhibits the exact same performance profile as when executed from a Linux environment: It seems strange that both this ticket and #108 are seeing slow relative performance on the order of a 100x speed difference. Perhaps that is coincidental, but I am curious if there are any speed metrics that have been tracked to validate any expected performance levels? |
Just to share with others that may be curious, the first debugging exercise IBM support would like us to do:
|
This is very strange. I am seeing this as well., though JDBC is only twice as slow as ODBC, even though it goes through the, now defunct, JDBC-ODBC bridge from Sun. I would expect that to be slower. |
Here are a few tips that I can think of.
|
Thanks @jeber-ibm for those suggestions. After trying these, I am curious why the The trace file shows the setting is being picked up Though we are ignoring this warning, it is still being reported in the trace log
I still don't get how the ODBC driver on the same machine dramatically outperforms the JDBC driver. Too bad I can't leverage the ODBC driver in the target Linux environments where the JDBC driver is needed. |
Looks like the ignore warnings isn't working as expected for the result set path. I'll open a separate issue for that problem. (#118) Is there a way to adjust the query to avoid the time conversion warning from being generated on the server? If there is a lot of warnings, it probably is the main cause of your problem. Also the format of your "ignore warnings" is not correct. It should list the SQLSTATES not the SQLCODEs. |
Sure enough ... if I remove the 4 date fields (out of 75 total columns) from the SELECT, then query returns much faster. The response time went from 120 seconds down to 4 seconds, just be removing those 4 date columns from the query -- so this definitely supports your note @jeber-ibm that warnings are the issue. However, I'm not sure I understand how to set the "ignore warnings" parameter. I've tried setting it to every kind of value I can see in the warning detail, and nothing appears to stick. I found this sqlstate values table in IBM docs that seems to indicate |
UPDATE: Good news and Less Good News? I confirmed the warnings are removed from the trace now using the new IBM Support compared traces from ODBC and JDBC connections and found that the ODBC driver had the So ... does this mean there is a problem with the JT400's driver defaults? Should we be matching IBM's ODBC driver default for this? Or does that cause other problems? I'm still doing testing as I worry that maybe other tables we're querying will have a different date format. In that case, how would I handle a case of mixed date formats over the same connection?! 🤔 UPDATED WITH EXAMPLE CODE: def get_db2_jdbc_options(db2_host, db2_user, db2_pass, source_database=None, source_table=None):
# see: https://www.ibm.com/docs/api/v1/content/ssw_ibm_i_73/rzahh/javadoc/com/ibm/as400/access/doc-files/JDBCProperties.htm?view=embed
connection_options = {
"prompt": "false", # avoid headless exception
"naming": "sql",
"access": "read only",
"readOnly": "true",
"ignore warnings": "all",
"date format": "iso" # <==== set date format to avoid warnings which led to perform problem
}
connection_url_options = ";".join([f"{key}={value}" for key, value in connection_options.items()])
jdbc_options = {
"url": f"jdbc:as400://{db2_host}/{source_database if source_database is not None else ''};{connection_url_options}",
"driver": "com.ibm.as400.access.AS400JDBCDriver",
"user": db2_user,
"password": db2_pass,
}
if source_table is not None:
jdbc_options["dbtable"] = source_table
return {**jdbc_options, **connection_options}
def read_sql_from_db2(db2_host, db2_user, db2_pass, source_database, sql):
# load dataframe for source data
jdbc_options = get_db2_jdbc_options(db2_host, db2_user, db2_pass, source_database)
df = ( spark.read
.format("jdbc")
.options(**jdbc_options)
.option("query", sql)
.load() )
return df |
I'm surprised that the date format is still causing a performance problem. It makes me wonder if there is still a Java exception that is being produced but being ignored by the driver. Unfortunately, the driver default date format has been set. Because of compatibility issues, it is unlikely to be changed. The date format is connected with the JOB / connection used to access the data. Internally, all dates are stored in a different format. From the SQL reference, we can read the following. _A date is a three-part value (year, month, and day) designating a point in time under the Gregorian The range of the year part is 0001 to 9999.... The internal representation of a date is a string of 4 bytes that contains an integer. The integer (called the You can see the range of the internal values using the following SQL statements.
|
@jboarman, regarding an ODBC driver in your Linux environment, are you unable to use the Linux Application Package here : https://www.ibm.com/support/pages/ibm-i-access-client-solutions ? |
@sandman507th, that's an interesting idea. I'm not sure how to install that driver in Linux. I was looking for a JAR file that installs as a standard Java dependency, but I have a coworker that has used linuxodbc in databricks ... so that very well might be an option to try next time if one can't resolve a similar issue. |
Why would an old ODBC driver be over 10x faster than a recent version of the JT400 driver?
We have an internal application that is accessing an AS/400 DB2 instance using the Windows 32-bit ODBC driver labeled "iSeries Access ODBC Driver (v13.64.14)".
However, on the same machine, we've run v11.2 of JTOpen JDBC driver using 2 different applications, and the same SQL queries take several hours compared to only about 10 minutes on the older ODBC driver.
As a side question ... how does one get paid support for this driver? (Though I've meticulously compared the settings between these two use cases, I'm hoping there is a bonehead mistake that someone more knowledgeable in DB2 for iSeries might be able to point out!)
UPDATE
Issue: The driver was throwing tons of warnings due to an unexpected date format exception found in the trace log. The internal exceptions are extremely expensive and slow down the query by about 100x on a number of experiments as compared to the same query run via IBM's ODBC driver.
Resolution: Adding
date format=iso
to the JDBC url resolves this problem by matching the default format used by the ODBC driver (and what clients like DBeaver think is the default setting but apparently is not for the JDBC driver). Ideally, the next breaking version of JTOpen might consider going with this standard as the default.The text was updated successfully, but these errors were encountered: