Skip to content

Commit

Permalink
Add a note to the parser documentation about stripping BOMs.
Browse files Browse the repository at this point in the history
  • Loading branch information
jmdavis committed Aug 30, 2023
1 parent 48e8e0b commit 02f484c
Showing 1 changed file with 8 additions and 4 deletions.
12 changes: 8 additions & 4 deletions source/dxml/parser.d
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,14 @@
does not).
Regardless of what the XML declaration says (if present), any range of
$(K_CHAR) will be treated as being encoded in UTF-8, any range of $(K_WCHAR)
will be treated as being encoded in UTF-16, and any range of $(K_DCHAR) will
be treated as having been encoded in UTF-32. Strings will be treated as
ranges of their code units, not code points.
$(K_CHAR) will be treated as being encoded in UTF-8, any range of
$(K_WCHAR) will be treated as being encoded in UTF-16, and any range of
$(K_DCHAR) will be treated as having been encoded in UTF-32. Strings will
be treated as ranges of their code units, not code points. Note that like
Phobos typically does when processing strings, the code assumes that BOMs
have already been removed, so if the range of characters comes from a file
that uses a BOM, the calling code needs to strip it out before calling
$(LREF parseXML), or parsing will fail due to invalid characters.
Since the DTD is skipped, entity references other than the five which are
predefined by the XML spec cannot be fully processed (since wherever they
Expand Down

0 comments on commit 02f484c

Please sign in to comment.