ACH files are allowed to use only a subset of US-ASCII. This character set ensures that no disallowed characters are
encoded or decoded. The allowed subset consists of "ASCII values greater than hexadecimal 0x1F
." US-ASCII itself is
limited to 7-bits, resulting in a range from 0x20
through 0x7F
.
Although the ACH specification does not allow values below 0x20
or above 0x7F
, there are some exceptions implemented
by this ACH Charset:
0x0A
is a linefeed. While below the0x1F
limit, it is often used as a record separator. It is allowed for compatibility with common implementations.0x0D
is a carriage return. It is sometimes found in conjunction with a linefeed in files generated by Windows and related operating systems. Both the encoder and decoder silently skip carriage returns, and the encoder'scanEncode((char) 0x000D)
method returns false.
The above rules cause CRLF
to be encoded and decoded as LF
on all platforms. In addition:
0x7F
is an unprintable control character calledDEL
. It is not allowed.0x85
is encoded as a newline character. In Unicode, it is equivalent to the EBCDIC NL character used by mainframe systems as a line delimiter. For compatibility, this character set only encodes newline as a linefeed. Encoding is safe because the character is definitely a Unicode newline. Decoding a0x85
byte would not be safe because it would require guessing the actual (non-ASCII) encoding of the input stream. If it was UTF-8 then0x85
would be the second or later byte of a multibyte encoding. If it was WIN-1252 then0x85
would be a horizontal ellipsis (…). If it was ISO-8859-1 then0x85
would be undefined.
Java Charset
can be configured for one of three different actions when it encounters an error encoding or decoding
a character:
- Report, which in most cases results in a
CharacterCodingException
- Replace, which replaces the unknown code with a predefined placeholder
- Ignore, which causes the output to be shorter
The default is to replace the character, which is often the best approach. ACH files have a fixed-width record format, so ignoring errors by skipping characters may cause downstream processing to fail. Reporting errors with an exception may lead to an unrecoverable error requiring manual intervention.
An input stream that is expected to contain only characters allowed by ACH may encounter an unexpected value. Reporting
the error with an exception could abort and delay the entire file ingestion stage due to a single field on a single
record. Ignoring the error by skipping over the unexpected character may cause an offset that breaks subsequent
processing of a fixed-width field. The best approach may be to substitute a replacement character into the stream and
allow processing to continue. Using a Unicode replacement character (�) is the default action for a Java
Charset
.
InputStream bytesIn = new FileInputStream("input.ach");
// Charset can be passed by name because it has a provider resource in the classpath
Reader reader = new InputStreamReader(bytesIn, "ACH");
// Reader will replace unexpected bytes with the Unicode replacement character
If an input stream that is expected to contain only characters allowed by ACH encounters an unexpected value, it can be
configured to report the error with an exception. This prevents missing or replacement characters being passed, which
ensures that only completely clean inputs continue processing. This is not the default action for a Java Charset
, so
the behavior must be configured by modifying the Decoder
.
InputStream bytesIn = new FileInputStream("input.ach");
// Retrieve Charset by name because it has a provider resource in the classpath
Charset ACH = Charset.forName("ACH");
// Obtain an explicit decoder and override the default behavior on malformed input
CharsetDecoder decoder = ACH.newDecoder().onMalformedInput(CodingErrorAction.REPORT);
// Use the constructor that accepts a CharsetDecoder
Reader reader = new InputStreamReader(bytesIn, decoder);
// Reader will throw an exception if it encounters an unexpected byte
ACH files require each record to be 94 characters. The critical fields necessary for processing a file are usually
generated by well-tested templates. A template may include text fields from a source that contains a wider range of
characters than ACH allows. Injecting an unexpected character could cause problems for downstream systems. Reporting
the error with an exception could abort and delay the entire file generation stage due to a single field on a single
record. The best solution in this case is to replace the unexpected character with a replacement. Replacing the
offending character with the encoder's default replacement is the default action for a Java Charset
. The default
replacement is a question mark (?).
OutputStream bytesOut = new FileOutputStream("output.ach");
// Charset can be passed by name because it has a provider resource in the classpath
Writer writer = new OutputStreamWriter(bytesOut, "ACH");
// Writer will replace unexpected characters with '?'
If a single bad character is considered sufficient cause to abort generation of an ACH file, the encoding can be
configured to throw an exception rather than continuing. This is not the default action for a Java Charset
, so the
behavior must be configured by modifying the Encoder
.
OutputStream bytesOut = new FileOutputStream("output.ach");
// Retrieved Charset by name because it has a provider resource in the classpath
Charset ACH = Charset.forName("ACH");
// Obtain an explicit encoder and override the default behavior on unmappable output
CharsetEncoder encoder = ACH.newEncoder().onUnmappableCharacter(CodingErrorAction.REPORT);
// Use the constructor that accepts a CharsetEncoder
Writer writer = new OutputStreamWriter(bytesOut, encoder);
// Writer will throw an exception if it encounters an unexpected character