JT400 vs IBM i Access ODBC Driver -- Major Performance Issues #110

jboarman · 2023-07-19T01:27:45Z

Why would an old ODBC driver be over 10x faster than a recent version of the JT400 driver?

We have an internal application that is accessing an AS/400 DB2 instance using the Windows 32-bit ODBC driver labeled "iSeries Access ODBC Driver (v13.64.14)".

However, on the same machine, we've run v11.2 of JTOpen JDBC driver using 2 different applications, and the same SQL queries take several hours compared to only about 10 minutes on the older ODBC driver.

As a side question ... how does one get paid support for this driver? (Though I've meticulously compared the settings between these two use cases, I'm hoping there is a bonehead mistake that someone more knowledgeable in DB2 for iSeries might be able to point out!)

UPDATE

Issue: The driver was throwing tons of warnings due to an unexpected date format exception found in the trace log. The internal exceptions are extremely expensive and slow down the query by about 100x on a number of experiments as compared to the same query run via IBM's ODBC driver.

Resolution: Adding date format=iso to the JDBC url resolves this problem by matching the default format used by the ODBC driver (and what clients like DBeaver think is the default setting but apparently is not for the JDBC driver). Ideally, the next breaking version of JTOpen might consider going with this standard as the default.

The text was updated successfully, but these errors were encountered:

JMarkMurphy · 2023-07-19T02:03:54Z

One gets paid software support by paying the software support for their IBM i license.

kadler · 2023-07-19T14:29:18Z

As @JMarkMurphy said, if you have active SWMA for the IBM i OS, you can open a support ticket for ODBC and/or JDBC drivers.

jboarman · 2023-07-20T14:18:53Z

Thanks @JMarkMurphy and @kadler ... It's taking a bit to navigate through IBM's support, but we're getting that journey started.

For reference, here is a visual example demonstrating the speed difference between the 2 drivers (ODBC vs JDBC). For the same query, it's taking ~1 second via ODBC and ~120 seconds via JDBC!

Note that what we really need is a fast JDBC driver to use in a Databricks environment to load data from DB2 for reporting. Though this is an example from Windows, it exhibits the exact same performance profile as when executed from a Linux environment:

It seems strange that both this ticket and #108 are seeing slow relative performance on the order of a 100x speed difference. Perhaps that is coincidental, but I am curious if there are any speed metrics that have been tracked to validate any expected performance levels?

jboarman · 2023-07-20T21:30:26Z

Just to share with others that may be curious, the first debugging exercise IBM support would like us to do:

We would like to have you to take a TRCCNN concurrently with a JDBC trace with a recreate of the latency your seeing.
This can be done either from your windows or ubuntu system (prob easier for you on Windows I'm guessing just let us know).

Then we'd want to see a TRCCNN and a CWBCOTRC done using ODBC for a comparison.  

Instructions for all the traces below:

There are a few ways to get a JDBC trace going, they are all covered in this document:
https://www.ibm.com/support/pages/node/643289

The main properties of the trace to set are as follows:
com.ibm.as400.access.Trace.file=C:\temp\trace.out <==output file for the trace for Linux obviously the syntax would be /path/trace.out
com.ibm.as400.access.Trace.category=all  <== starts the trace
If your running command line you'd need to add the -D:
-Dcom.ibm.as400.access.Trace.file=C:\temp\trace.out
-Dcom.ibm.as400.access.Trace.category=all

The CWBCOTRC is covered in this document:
https://www.ibm.com/support/pages/node/687471
If the app that uses ODBC runs as a server make sure to restart the app once the trace is started and use the /allusers parameter.  

The TRCCNN is covered in this document:
https://www.ibm.com/support/pages/node/644111
Use example 1 and replace the example IP address with your windows server or Linux server IP depending on what client your tracing.
Stop the trace using the syntax given in the first how to stop the trace example: TRCCNN SET(*OFF) TRCTBL(MyTraceCNN)
and send in the resulting QPCSMPRT file in plain ASCII text (.txt).

JMarkMurphy · 2023-07-21T14:56:44Z

This is very strange. I am seeing this as well., though JDBC is only twice as slow as ODBC, even though it goes through the, now defunct, JDBC-ODBC bridge from Sun. I would expect that to be slower.

jeber-ibm · 2023-08-07T18:44:29Z

Here are a few tips that I can think of.

Make sure that the block size is set to the 512 maximum by adding ;block size=512 to the JDBC URL.
Make sure that the query you are running is not creating any warnings. Creating a warning in JDBC is VERY expensive, as it creates a Java exception (which includes the stack trace). If the query creates warning, the JDBC property "ignore warnings" can be used to specify the warnings to ignore.
If the query returns lobs, you may need to adjust the JDBC property lob threshold ( so that lobs are returned in-line instead of as locators).

jboarman · 2023-08-12T18:21:36Z

Thanks @jeber-ibm for those suggestions. After trying these, I am curious why the ignore warnings setting doesn't seem to be working! (so far I'm not seeing any improvements and the setting appears to be ignored for some reason)

The trace file shows the setting is being picked up
Thread[Connect to 'JDBC (JTOpen)',5,main] as400: Properties (919350410) : ignore warnings = "181,466,42703".

Though we are ignoring this warning, it is still being reported in the trace log

Sat Aug 12 12:49:28:427 CDT 2023
as400: Connection S10D4072 (1214679991)
Posting warning, id: 1
error class: 1
return code: 181
reason:
  [SQL0181] Value in date, time, or timestamp string not valid.
  Cause:
    The string representation of a date, time or timestamp value is not in the acceptable range. *N is either the
    character string constant that is not valid or the column or host variable that contained the string. If the name
    is *N, then the value was found in an expression specified in the statement. If the value was found in a host
    variable, then the host variable number is 0.

  The proper ranges for date, time, or timestamp values are as follows:
    -- The range for years is from 0001 to 9999. For date formats *MDY, *YMD, *DMY, and *JUL, the year must be in the
       range 1940 to 2039.
    -- The range for months is from 1 to 12.
    -- The range for days is from 1 - 30 for April, June, September, and November, from 1 - 28 for February and from 1
       to 31 for all other months. In a leap year, the range for February can be from 1 to 29.
    -- The range for days in a Julian date is from 001 to 366 for a leap year or 001 to 365 days for all other years.
    -- The range for hours is from 0 to 24. If the hour is 24, then the other parts of the time values must be zeros.
       If the time format is USA, then the hour cannot be greater than 12.
    -- The range for minutes is from 0 to 59.
    -- The range for seconds is from 0 to 59.
    -- The range for fractional seconds is from 0 to 999999999999.

  Recovery:
    Ensure that the date, time, or timestamp value conforms to the ranges for the data type it represents. Try the
    request again.

state: 01534.java.sql.SQLWarning:

	at com.ibm.as400.access.JDError.getSQLWarning(JDError.java:347)
	at com.ibm.as400.access.JDServerRowCache.fetch(JDServerRowCache.java:363)
	at com.ibm.as400.access.JDServerRowCache.fetch(JDServerRowCache.java:270)
	at com.ibm.as400.access.JDServerRowCache.next(JDServerRowCache.java:804)
	at com.ibm.as400.access.AS400JDBCResultSet.next(AS400JDBCResultSet.java:1843)

I still don't get how the ODBC driver on the same machine dramatically outperforms the JDBC driver. Too bad I can't leverage the ODBC driver in the target Linux environments where the JDBC driver is needed.

jeber-ibm · 2023-08-12T19:23:36Z

Looks like the ignore warnings isn't working as expected for the result set path. I'll open a separate issue for that problem. (#118)

Is there a way to adjust the query to avoid the time conversion warning from being generated on the server?

If there is a lot of warnings, it probably is the main cause of your problem.

Also the format of your "ignore warnings" is not correct. It should list the SQLSTATES not the SQLCODEs.

jboarman · 2023-08-12T21:16:01Z

Sure enough ... if I remove the 4 date fields (out of 75 total columns) from the SELECT, then query returns much faster. The response time went from 120 seconds down to 4 seconds, just be removing those 4 date columns from the query -- so this definitely supports your note @jeber-ibm that warnings are the issue.

However, I'm not sure I understand how to set the "ignore warnings" parameter. I've tried setting it to every kind of value I can see in the warning detail, and nothing appears to stick. I found this sqlstate values table in IBM docs that seems to indicate 01534 or maybe even 22007 is the SQLSTATE for the SQLCODE of SQL0181. ~~Neither of these values appear to prevent the warnings, so I'm at a loss for whether the value is valid or the setting is just not working.~~ (nevermind ... I totally missed the new ticket #118 which appears to be aimed at this issue.)

jboarman · 2023-08-24T18:52:52Z

UPDATE: Good news and Less Good News?

I confirmed the warnings are removed from the trace now using the new ignore warnings=all JDBC URL parameter (thanks @jeber-ibm). HOWEVER, the performance problems remained. 🤯

IBM Support compared traces from ODBC and JDBC connections and found that the ODBC driver had the date format setting defaulted to iso while the JDBC driver did not. So, after explicitly setting that parameter, I found the query performance was now almost as fast as the ODBC driver when using the JDBC driver.

So ... does this mean there is a problem with the JT400's driver defaults? Should we be matching IBM's ODBC driver default for this? Or does that cause other problems?

I'm still doing testing as I worry that maybe other tables we're querying will have a different date format. In that case, how would I handle a case of mixed date formats over the same connection?! 🤔

UPDATED WITH EXAMPLE CODE:

def get_db2_jdbc_options(db2_host, db2_user, db2_pass, source_database=None, source_table=None):

    # see: https://www.ibm.com/docs/api/v1/content/ssw_ibm_i_73/rzahh/javadoc/com/ibm/as400/access/doc-files/JDBCProperties.htm?view=embed
    connection_options = {
        "prompt": "false",         # avoid headless exception
        "naming": "sql",
        "access": "read only",
        "readOnly": "true",
        "ignore warnings": "all",
        "date format": "iso"       # <==== set date format to avoid warnings which led to perform problem
    }
    connection_url_options = ";".join([f"{key}={value}" for key, value in connection_options.items()])
    jdbc_options = {
        "url": f"jdbc:as400://{db2_host}/{source_database if source_database is not None else ''};{connection_url_options}",
        "driver": "com.ibm.as400.access.AS400JDBCDriver",
        "user": db2_user,
        "password": db2_pass,
    }
    if source_table is not None:
        jdbc_options["dbtable"] = source_table
    return {**jdbc_options, **connection_options}


def read_sql_from_db2(db2_host, db2_user, db2_pass, source_database, sql):

    # load dataframe for source data
    jdbc_options = get_db2_jdbc_options(db2_host, db2_user, db2_pass, source_database)
    df = ( spark.read
                .format("jdbc")
                .options(**jdbc_options)
                .option("query", sql)
                .load() )
    return df

jeber-ibm · 2023-08-24T19:22:50Z

I'm surprised that the date format is still causing a performance problem. It makes me wonder if there is still a Java exception that is being produced but being ignored by the driver.

Unfortunately, the driver default date format has been set. Because of compatibility issues, it is unlikely to be changed.

The date format is connected with the JOB / connection used to access the data. Internally, all dates are stored in a different format. From the SQL reference, we can read the following.

_A date is a three-part value (year, month, and day) designating a point in time under the Gregorian
calendar, which is assumed to have been in effect from the year 1 A.D.

The range of the year part is 0001 to 9999....

The internal representation of a date is a string of 4 bytes that contains an integer. The integer (called the
Scaliger number) represents the date._

You can see the range of the internal values using the following SQL statements.

values hex(date('0001-01-01'))
00001
001A4452
values hex(date('9999-12-31'))
00001
0051FE2C

sandman507th · 2023-09-07T20:49:02Z

@jboarman, regarding an ODBC driver in your Linux environment, are you unable to use the Linux Application Package here : https://www.ibm.com/support/pages/ibm-i-access-client-solutions ?

jboarman · 2023-09-13T22:34:37Z

@sandman507th, that's an interesting idea. I'm not sure how to install that driver in Linux. I was looking for a JAR file that installs as a standard Java dependency, but I have a coworker that has used linuxodbc in databricks ... so that very well might be an option to try next time if one can't resolve a similar issue.

jboarman mentioned this issue Jul 19, 2023

JDBC connection slow for Windows Server 2019 and 2022 #108

Closed

jboarman closed this as completed Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JT400 vs IBM i Access ODBC Driver -- Major Performance Issues #110

JT400 vs IBM i Access ODBC Driver -- Major Performance Issues #110

jboarman commented Jul 19, 2023 •

edited

Loading

JMarkMurphy commented Jul 19, 2023 •

edited

Loading

kadler commented Jul 19, 2023

jboarman commented Jul 20, 2023

jboarman commented Jul 20, 2023

JMarkMurphy commented Jul 21, 2023

jeber-ibm commented Aug 7, 2023

jboarman commented Aug 12, 2023

jeber-ibm commented Aug 12, 2023 •

edited

Loading

jboarman commented Aug 12, 2023 •

edited

Loading

jboarman commented Aug 24, 2023 •

edited

Loading

jeber-ibm commented Aug 24, 2023

sandman507th commented Sep 7, 2023

jboarman commented Sep 13, 2023

JT400 vs IBM i Access ODBC Driver -- Major Performance Issues #110

JT400 vs IBM i Access ODBC Driver -- Major Performance Issues #110

Comments

jboarman commented Jul 19, 2023 • edited Loading

UPDATE

JMarkMurphy commented Jul 19, 2023 • edited Loading

kadler commented Jul 19, 2023

jboarman commented Jul 20, 2023

jboarman commented Jul 20, 2023

JMarkMurphy commented Jul 21, 2023

jeber-ibm commented Aug 7, 2023

jboarman commented Aug 12, 2023

jeber-ibm commented Aug 12, 2023 • edited Loading

jboarman commented Aug 12, 2023 • edited Loading

jboarman commented Aug 24, 2023 • edited Loading

jeber-ibm commented Aug 24, 2023

sandman507th commented Sep 7, 2023

jboarman commented Sep 13, 2023

jboarman commented Jul 19, 2023 •

edited

Loading

JMarkMurphy commented Jul 19, 2023 •

edited

Loading

jeber-ibm commented Aug 12, 2023 •

edited

Loading

jboarman commented Aug 12, 2023 •

edited

Loading

jboarman commented Aug 24, 2023 •

edited

Loading