You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have to consume a message from a message broker with (sometimes) broken encoding in one of its attributes. (Its from a legacy software that nobody wants/dares to touch.)
Currently when trying to parse the mesages I get the following Exception:
com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xfc (at char #736, byte #53)
at com.fasterxml.jackson.dataformat.xml.util.StaxUtil.throwAsParseException(StaxUtil.java:37)
at com.fasterxml.jackson.dataformat.xml.XmlFactory._initializeXmlReader(XmlFactory.java:657)
at com.fasterxml.jackson.dataformat.xml.XmlFactory._createParser(XmlFactory.java:593)
at com.fasterxml.jackson.dataformat.xml.XmlFactory._createParser(XmlFactory.java:29)
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3091)
...
If I use the same bytes in a String directly it works perfectly fine.
It would be nice if I could use an option to allow broken encodings in my Strings instead of Exceptions.
(After parsing the input, I usually have enough context to know which messages I have to fix and how)
I use jackson-dataformat-xml 2.9.6 + woodstox 5.0.3/5.1 to parse the message.
Currently I use the following workaround to bypass the issue:
byte[] bytes = ...;
try {
returnxmlMapper.readValue(bytes, StateInfo.class);
} catch (JsonParseExceptione) {
try {
LOG.debug("Attempting fix");
byte[] bytes2 = newString(bytes, CHARSET_ALT1).getBytes(UTF_8);
returnxmlMapper.readValue(bytes2, StateInfo.class);
} catch (JsonParseExceptione1) {
// Contains special characters from multiple encodings (in different attributes)LOG.error("Failed to repair message - Writing message to disk for manual fix");
writeToDisk(e, bytes);
throwe;
}
}
As an alternative I considered using a plain byte solution, but unfortunately the parser still tries to parse the input as String so it can use it with base64 encoding and I did't find a way to tell the parser just give me the bytes without reverse base64 it first.
<data attr="Success" />
Data: Success
<data attr="S�ccess" />
Exception in thread "main" com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xfc (at char #14, byte #-1)
at com.fasterxml.jackson.dataformat.xml.util.StaxUtil.throwAsParseException(StaxUtil.java:37)
at com.fasterxml.jackson.dataformat.xml.XmlFactory._initializeXmlReader(XmlFactory.java:657)
at com.fasterxml.jackson.dataformat.xml.XmlFactory._createParser(XmlFactory.java:593)
at com.fasterxml.jackson.dataformat.xml.XmlFactory._createParser(XmlFactory.java:29)
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3091)
at example.Test.main(Test.java:67)
Caused by: java.io.CharConversionException: Invalid UTF-8 start byte 0xfc (at char #14, byte #-1)
at com.ctc.wstx.io.UTF8Reader.reportInvalidInitial(UTF8Reader.java:304)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:190)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:89)
at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:995)
at com.ctc.wstx.sr.StreamScanner.getNext(StreamScanner.java:754)
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2074)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1175)
at com.fasterxml.jackson.dataformat.xml.XmlFactory._initializeXmlReader(XmlFactory.java:653)
... 5 more
The text was updated successfully, but these errors were encountered:
I don't think this is something Woodstox should really be doing. Although I understand it may be inconvenient, I think handling of broken content is something that application needs to configure somehow.
I have to consume a message from a message broker with (sometimes) broken encoding in one of its attributes. (Its from a legacy software that nobody wants/dares to touch.)
Currently when trying to parse the mesages I get the following Exception:
If I use the same bytes in a String directly it works perfectly fine.
It would be nice if I could use an option to allow broken encodings in my Strings instead of Exceptions.
(After parsing the input, I usually have enough context to know which messages I have to fix and how)
I use jackson-dataformat-xml 2.9.6 + woodstox 5.0.3/5.1 to parse the message.
Currently I use the following workaround to bypass the issue:
As an alternative I considered using a plain byte solution, but unfortunately the parser still tries to parse the input as String so it can use it with base64 encoding and I did't find a way to tell the parser just give me the bytes without reverse base64 it first.
Code to reproduce
Data class:
Test method:
Output:
The text was updated successfully, but these errors were encountered: