-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concerns on whitespace emitting and parsing #113
Comments
@jekku Thanks for the comprehensive report! Unfortunately #51 was a bug in the parser that we need to fix. Saxy should match and emit all I understand it introduced an inconvenient breaking change. The patch was released with a major version bump (from v0 to v1) though I learned the CHANGELOG doesn't emphasize that enough. I did expect some slowness since the fix indeed adds work for the parser but didn't anticipate it to be noticeable. If you could provide the benchmark suite with the sample code which causes the performance penalty, I could look into what could possibly be optimized in the parser. |
Linking #103 since it could be relevant. |
Hello. Just got back to this. I forked the old saxy version into a different package locally to avoid namespace conflicts and then did:
The file we used to stream is a big pretty printed formatted XML file that's around 10MB |
Context
We use Saxy to parse XML files from several APIs.
There have been changes introduced to emit whitespaces correctly, connected to [this issue and its pull request](#51).
Concerns
For our first concern related to the above, we noticed a reduction in performance when using the latest version of Saxy (v1.4.0), as opposed to a fork based on v0.9.1.
We did some digging to check whether it was changes on our end, or if it was a performance regression in this library.
We observed that the generated SimpleForm data from a pretty-printed XML, using the latest version of saxy is double the file size from the original one. Very likely caused or correlated by the emitted / parsed whitespaces.
Running benchmarks between SimpleForm results coming from both versions, we found that parsed / emitted whitespaces cause some performance regressions.
The second concern is about whitespace values within tags. Previously, an XML in the shape of:
Would result into the contents of Value being parsed as
nil
. In the current build, it is parsed as a string containing whitespace.In a previous issue mentioned in the beginning, this should only happen when certain attributes are provided. Like so:
Proposed solutions
xml:space="preserve"
is provided. By default whitespaces should not be emitted.The text was updated successfully, but these errors were encountered: