A XML parser.
A XML sax-style parser.
A message based XML stream parser
Designed with node in mind, but should work fine in the browser or other CommonJS implementations.
Based on https://github.com/isaacs/sax-js and converted to typescript.
- A very simple tool to parse through an XML string.
- A stepping stone to a streaming HTML parser.
- A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML docs.
- An HTML Parser - That's a fine goal, but this isn't it. It's just XML.
- A DOM Builder - You can use it to build an object model out of XML, but it doesn't do that out of the box.
- XSLT - No DOM = no querying.
- 100% Compliant with (some other SAX implementation) - Most SAX implementations are in Java and do a lot more than this does.
- An XML Validator - It does a little validation when in strict mode, but not much.
- A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic masochism.
- A DTD-aware Thing - Fetching DTDs is a much bigger job.
The parser will handle the basic XML entities in text nodes and attribute
values: & < > ' "
. It's possible to define additional
entities in XML by putting them in the DTD. This parser doesn't do anything
with that. If you want to listen to the ondoctype
event, and then fetch
the doctypes, and read the entities and add them to parser.ENTITIES
, then
be my guest.
Unknown entities will fail in unless in lenient mode, when they will pass through unmolested.
var smax = require("smax")
var parser = smax.parser();
parser.onerror = function (e) {
// an error happened.
};
parser.ontext = function (t) {
// got some text. t is the string of text.
};
parser.onopentag = function (node) {
// opened a tag. node has "name" and "attributes"
};
parser.onend = function () {
// parser stream is done, and ready to have more stuff written to it.
};
parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();
// stream usage
// takes the same options as the parser
var saxStream = require("smax").createStream(options)
saxStream.on("error", function (e) {
// unhandled errors will throw, since this is a proper node
// event emitter.
console.error("error!", e)
// clear the error
this._parser.resume()
})
// pipe is supported, and it's readable/writable
//
fs.createReadStream("file.xml")
.pipe(saxStream)
.pipe(...)
Pass the following arguments to the parser function. All are optional.
opt
- Object bag of settings regarding string formatting. All default to false
.
Settings supported:
lenient
- Boolean. Whether or not parser will fail on improperly formed xml.trim
- Boolean. Whether or not to trim text and comment nodes.normalize
- Boolean. If true, then turn any whitespace into a single space.xmlns
- Boolean. If true, then namespaces are supported.position
- Boolean. If false, then don't track line/col/position.strictEntities
- Boolean. If true, only parse predefined XML entities (&
,'
,>
,<
, and"
)
write
- Write bytes onto the stream. You don't have to do this all at
once. You can keep writing as much as you want.
close
- Close the stream. Once closed, no more data may be written until
it is done processing the buffer, which is signaled by the end
event.
resume
- To gracefully handle errors, assign a listener to the error
event. Then, when the error is taken care of, you can call resume
to
continue parsing. Otherwise, the parser will not continue while in an error
state.
At all times, the parser object will have the following members:
position
- returns an object indicating the postions in the XML document
position
- current offsetline
- current linecolumn
- current columnstartTagPosition
- position where the current tag starts.
strict
- Boolean indicating whether or not the parser is strict xml mode.
opt
- Any options passed into the constructor.
And a bunch of other stuff that you probably shouldn't touch.
All events emit with a single argument. To listen to an event, assign a
function to on<eventname>
. Functions get executed in the this-context of
the parser object. The list of supported events are also in the exported
EVENTS
array.
When using the stream interface, assign handlers using the EventEmitter
on
function in the normal fashion.
error
- Indication that something bad happened. The error will be hanging
out on parser.error
, and must be deleted before parsing can continue. By
listening to this event, you can keep an eye on that kind of stuff. Note:
this happens much more in strict mode. Argument: instance of Error
.
text
- Text node. Argument: string of text.
doctype
- The <!DOCTYPE
declaration. Argument: doctype string.
processinginstruction
- Stuff like <?xml foo="blerg" ?>
. Argument:
object with name
and body
members. Attributes are not parsed, as
processing instructions have implementation dependent semantics.
sgmldeclaration
- Random SGML declarations. Stuff like <!ENTITY p>
would trigger this kind of event. This is a weird thing to support, so it
might go away at some point. SAX isn't intended to be used to parse SGML,
after all.
opentag
- An opening tag. Argument: object with name
and attributes
.
In non-strict mode, tag names are uppercased, unless the lowercase
option is set. If the xmlns
option is set, then it will contain
namespace binding information on the ns
member, and will have a
local
, prefix
, and uri
member.
closetag
- A closing tag. In loose mode, tags are auto-closed if their
parent closes. In strict mode, well-formedness is enforced. Note that
self-closing tags will have closeTag
emitted immediately after openTag
.
Argument: tag name.
comment
- A comment node. Argument: the string of the comment.
opencdata
- The opening tag of a <![CDATA[
block.
cdata
- The text of a <![CDATA[
block. Since <![CDATA[
blocks can get
quite large, this event may fire multiple times for a single block, if it
is broken up into multiple write()
s. Argument: the string of random
character data.
closecdata
- The closing tag (]]>
) of a <![CDATA[
block.
end
- Indication that the closed stream has ended.
ready
- Indication that the stream has reset, and is ready to be written
to.
noscript
- In non-strict mode, <script>
tags trigger a "script"
event, and their contents are not checked for special xml characters.
If you pass noscript: true
, then this behavior is suppressed.
It's best to write a failing test if you find an issue. I will always accept pull requests with failing tests if they demonstrate intended behavior, but it is very hard to figure out what issue you're describing without a test. Writing a test is also the best way for you yourself to figure out if you really understand the issue you think you have with sax-js.