diff --git a/README.md b/README.md index 1f13174..29cd748 100644 --- a/README.md +++ b/README.md @@ -6,11 +6,37 @@ Your user-friendly SAX wrapper to transform XML files easily, with memory consum Saxeed, a SAX wrapper, stream process XML input performing modifications to its outputs based on predefined transformation(s). -It accepts the constraints of "streaming" (or "eventing") approach — elements are visited one-by-one with no option to look ahead in the stream. -This is a tradeoff we accept in return for predictable memory footprint. +It accepts the constraints of "streaming" (or "eventing") approach — elements are visited one-by-one with no option to move around the stream. +This is a tradeoff we accept in return for predictable memory footprint. + +The very nature of stream-based processing restricts the data that are available in every moment, and modifications that are permitted. +To accommodate that, developer needs to accept a paradigm shift compared to, say, dom4j. + +Saxeed strives to add as much convenience on top of plain old SAX, while adding as little of an overhead. + +### Capabilities + +Each tag visitor have access to / can modify the following: + +| | Tag Start | Tag End | +|------------------------------------|------------------------|------------------------| +| Access Tag attributes | ☑ | ☑ | +| Access Parent(s) Tag attributes | ☑ | ☑ | +| Add Child Tags | ☑ | ☑ (before closing tag) | +| Add Sibling Tags (NOT IMPLEMENTED) | ☑ (before and after) | ☑ (only after) | +| Add Parent Tag (`wrapWith()`) | ☑ | ☐ | +| Change Attributes | ☑ | ☐ | +| Delete Tag (`unwrap()`) | ☑ | ☐ | +| Delete Tag Recursively (`skip()`) | ☑ | ☐ | +| Delete Child Tags (`empty()`) | ☑ | ☐ | + +More complex changes can be implemented by subscribing visitors to multiple tags, and retaining information between their visits. ## Usage +[Basic Concepts](./docs/BASICS.md) +[Implementing Visitors](./docs/VISITORS.md) + ### Dependency To consume the library in maven: diff --git a/docs/BASICS.md b/docs/BASICS.md new file mode 100644 index 0000000..0efcdc1 --- /dev/null +++ b/docs/BASICS.md @@ -0,0 +1,49 @@ +# Saxeed essential concepts + +Saxeed passes through an input xml — in form of stream, or a file — and passes is through one or more *transformations*. +Each transformation prescribes what changes to perform to an XML document through a series of *visitors* that are *subscribed* to certain tag sets. +Also, it specifies a *target* location where the resulting XML document is written. + +```java +tb = new TransformationBuilder().add(Subscribed.to("entry"), entryVisitor); +new Saxeed() + .setInput(srcFile) + .addTransformation(tb, targetPath) + .transform(); +``` + +In this example, we stream through `srcFile` and have its content processed by a single transformation. +That transformation responds to all `` tags in that document invoking `entryVisitor` we provided. +The resulting XML document is then written to `targetPath`. + +## Targets + +Targets specifies where the resulting XML stream should be written. +It can be a `File` or `Path` instance to write it to a file system, or an `OutputStream` or `XMLStreamWriter` if more control is needed. + +Saxeed always closes the targets that it had opened (files), and never closes targets opened by the client (streams or writers). + +## Visitors + +Visitor is a client-provided implementation of `UpdatingVisitor` that handles XML tag events. +The visitor methods are invoked when corresponding even is encountered in the input XML document, like `startTag(Tag)` or `endDocument()`. + +Depending on a particular method invoked, the visitor can perform modifications on visited tags — the modified version will be sent to target. + +## Transformations + +Transformation is a composition of visitors *subscribed* to certain tag sets. +A transformation with no visitors simply writes the input XML document to its target. +It will also be empty if the visitors perform no modifications (or additions or removals) to subscribed tags. + +Client can register any number of transformations, provided they output to a different targets. +Parallel transformations are executed independently on one another, but still during a single pass through the input XML document. + +Each transformation can contain one or more visitors. +Each visitor can either be subscribed to all the tags in the document (`Subscribed.toAll()`), or just a set of selected ones. + +## Subscriptions + +Same as single visitor can be subscribed to multiple tag names, multiple visitors are subscribed for the same tag name. +Then, they are executed in the order of their addition for "opening events" and in reversed addition order for "closing events". +So for example on `` all the visitors subscribed to "entry" (or all the tags) have their `startTag(Tag)` method called. diff --git a/docs/VISITORS.md b/docs/VISITORS.md new file mode 100644 index 0000000..caa2b17 --- /dev/null +++ b/docs/VISITORS.md @@ -0,0 +1,58 @@ +# Creating custom visitors + +Visitors respond to all events related to the tags they have been subscribed to. + +## Data access + +Each visitor method receives information about the input XML document position through its arguments. +Namely, `Tag` implementations. + +It provides access to the tag name, but more importantly tag attributes. + +Also, the chain of parents is also available. +They are too implementations of `Tag` interface, so they can be looked up, and decisions can be made based on their state. + +Note the tag ancestors (parent and its ancestors) can only access data, but not modify it, because they have been written already. + +## Modifications + +Currently visited tag can perform modification as described in the [Capabilities](../README.md#capabilities) section. + +What modifications are possible differs between `startTag()` and `endTag()`, simply because by the time the closing tag is encountered, the opening tag have already been written. +So attribute modifications and tag deletion are reserved to tag start event. + +Children addition, however, can be done it both `startTag()` and `endTag()`. +In the latter case, they will be added before the closing tag. + +## Tag deletion + +Tag can be deleted. +Or put differently, Saxeed let visitors decide that some tags will not be writen to the target. + +By default, all tags are written as they are. +In `startTag()` (only), visitor can choose to delete the tag. + +When the tag is deleted, its opening and closing tag will not be writen, and neither will be its text content. +Handling children deletion is configured as follows: + +| | `skip()` | `unwrap()` | `empty()` | keep (the default) | +|-----------------|----------|------------|-----------|--------------------| +| Delete this tag | ☑ | ☑ | ☐ | ☐ | +| Delete children | ☑ | ☐ | ☑ | ☐ | + + +--- + +When the currently visited tag is deleted, all remaining visitors have their `startTag()` method called. +The `Tag#isOmitted()` will return `true` for them signaling the tag will not be part of the output. +The `endTag()` methods will not be called for a deleted tags at all. + +When tag is deleted as a result of an ancestor calling `skip()` or `empty()`, no listener methods are called. + +## `Tag` interfaces + +In saxeed, current tag is represented by `Tag` interface. +It is the most restricted form, that only permit data access. + +`Tag.End` specialization passed to `endTag()`, and `Tag.Start` passed to `startTag()` adds respective methods for extended capabilities present in given time of the input document traversal. +This is to provide compile-type guarantee, that operations used are permitted in any given time. diff --git a/src/main/java/com/github/olivergondza/saxeed/Tag.java b/src/main/java/com/github/olivergondza/saxeed/Tag.java index 12c0710..a393ab4 100644 --- a/src/main/java/com/github/olivergondza/saxeed/Tag.java +++ b/src/main/java/com/github/olivergondza/saxeed/Tag.java @@ -13,8 +13,6 @@ public interface Tag { boolean isNamed(String name); - boolean isGenerated(); - Tag getParent(); String getName(); @@ -25,6 +23,8 @@ public interface Tag { Map getAttributes(); + boolean isGenerated(); + boolean isOmitted(); interface Start extends Tag {