Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(docs): Add usage docs #4

Merged
merged 1 commit into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 28 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,37 @@ Your user-friendly SAX wrapper to transform XML files easily, with memory consum

Saxeed, a SAX wrapper, stream process XML input performing modifications to its outputs based on predefined transformation(s).

It accepts the constraints of "streaming" (or "eventing") approach — elements are visited one-by-one with no option to look ahead in the stream.
This is a tradeoff we accept in return for predictable memory footprint.
It accepts the constraints of "streaming" (or "eventing") approach — elements are visited one-by-one with no option to move around the stream.
This is a tradeoff we accept in return for predictable memory footprint.

The very nature of stream-based processing restricts the data that are available in every moment, and modifications that are permitted.
To accommodate that, developer needs to accept a paradigm shift compared to, say, dom4j.

Saxeed strives to add as much convenience on top of plain old SAX, while adding as little of an overhead.

### Capabilities

Each tag visitor have access to / can modify the following:

| | Tag Start | Tag End |
|------------------------------------|------------------------|------------------------|
| Access Tag attributes | ☑ | ☑ |
| Access Parent(s) Tag attributes | ☑ | ☑ |
| Add Child Tags | ☑ | ☑ (before closing tag) |
| Add Sibling Tags (NOT IMPLEMENTED) | ☑ (before and after) | ☑ (only after) |
| Add Parent Tag (`wrapWith()`) | ☑ | ☐ |
| Change Attributes | ☑ | ☐ |
| Delete Tag (`unwrap()`) | ☑ | ☐ |
| Delete Tag Recursively (`skip()`) | ☑ | ☐ |
| Delete Child Tags (`empty()`) | ☑ | ☐ |

More complex changes can be implemented by subscribing visitors to multiple tags, and retaining information between their visits.

## Usage

[Basic Concepts](./docs/BASICS.md)
[Implementing Visitors](./docs/VISITORS.md)

### Dependency

To consume the library in maven:
Expand Down
49 changes: 49 additions & 0 deletions docs/BASICS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Saxeed essential concepts

Saxeed passes through an input xml — in form of stream, or a file — and passes is through one or more *transformations*.
Each transformation prescribes what changes to perform to an XML document through a series of *visitors* that are *subscribed* to certain tag sets.
Also, it specifies a *target* location where the resulting XML document is written.

```java
tb = new TransformationBuilder().add(Subscribed.to("entry"), entryVisitor);
new Saxeed()
.setInput(srcFile)
.addTransformation(tb, targetPath)
.transform();
```

In this example, we stream through `srcFile` and have its content processed by a single transformation.
That transformation responds to all `<entry>` tags in that document invoking `entryVisitor` we provided.
The resulting XML document is then written to `targetPath`.

## Targets

Targets specifies where the resulting XML stream should be written.
It can be a `File` or `Path` instance to write it to a file system, or an `OutputStream` or `XMLStreamWriter` if more control is needed.

Saxeed always closes the targets that it had opened (files), and never closes targets opened by the client (streams or writers).

## Visitors

Visitor is a client-provided implementation of `UpdatingVisitor` that handles XML tag events.
The visitor methods are invoked when corresponding even is encountered in the input XML document, like `startTag(Tag)` or `endDocument()`.

Depending on a particular method invoked, the visitor can perform modifications on visited tags — the modified version will be sent to target.

## Transformations

Transformation is a composition of visitors *subscribed* to certain tag sets.
A transformation with no visitors simply writes the input XML document to its target.
It will also be empty if the visitors perform no modifications (or additions or removals) to subscribed tags.

Client can register any number of transformations, provided they output to a different targets.
Parallel transformations are executed independently on one another, but still during a single pass through the input XML document.

Each transformation can contain one or more visitors.
Each visitor can either be subscribed to all the tags in the document (`Subscribed.toAll()`), or just a set of selected ones.

## Subscriptions

Same as single visitor can be subscribed to multiple tag names, multiple visitors are subscribed for the same tag name.
Then, they are executed in the order of their addition for "opening events" and in reversed addition order for "closing events".
So for example on `</entry>` all the visitors subscribed to "entry" (or all the tags) have their `startTag(Tag)` method called.
58 changes: 58 additions & 0 deletions docs/VISITORS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Creating custom visitors

Visitors respond to all events related to the tags they have been subscribed to.

## Data access

Each visitor method receives information about the input XML document position through its arguments.
Namely, `Tag` implementations.

It provides access to the tag name, but more importantly tag attributes.

Also, the chain of parents is also available.
They are too implementations of `Tag` interface, so they can be looked up, and decisions can be made based on their state.

Note the tag ancestors (parent and its ancestors) can only access data, but not modify it, because they have been written already.

## Modifications

Currently visited tag can perform modification as described in the [Capabilities](../README.md#capabilities) section.

What modifications are possible differs between `startTag()` and `endTag()`, simply because by the time the closing tag is encountered, the opening tag have already been written.
So attribute modifications and tag deletion are reserved to tag start event.

Children addition, however, can be done it both `startTag()` and `endTag()`.
In the latter case, they will be added before the closing tag.

## Tag deletion

Tag can be deleted.
Or put differently, Saxeed let visitors decide that some tags will not be writen to the target.

By default, all tags are written as they are.
In `startTag()` (only), visitor can choose to delete the tag.

When the tag is deleted, its opening and closing tag will not be writen, and neither will be its text content.
Handling children deletion is configured as follows:

| | `skip()` | `unwrap()` | `empty()` | keep (the default) |
|-----------------|----------|------------|-----------|--------------------|
| Delete this tag | ☑ | ☑ | ☐ | ☐ |
| Delete children | ☑ | ☐ | ☑ | ☐ |


---

When the currently visited tag is deleted, all remaining visitors have their `startTag()` method called.
The `Tag#isOmitted()` will return `true` for them signaling the tag will not be part of the output.
The `endTag()` methods will not be called for a deleted tags at all.

When tag is deleted as a result of an ancestor calling `skip()` or `empty()`, no listener methods are called.

## `Tag` interfaces

In saxeed, current tag is represented by `Tag` interface.
It is the most restricted form, that only permit data access.

`Tag.End` specialization passed to `endTag()`, and `Tag.Start` passed to `startTag()` adds respective methods for extended capabilities present in given time of the input document traversal.
This is to provide compile-type guarantee, that operations used are permitted in any given time.
4 changes: 2 additions & 2 deletions src/main/java/com/github/olivergondza/saxeed/Tag.java
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@ public interface Tag {

boolean isNamed(String name);

boolean isGenerated();

Tag getParent();

String getName();
Expand All @@ -25,6 +23,8 @@ public interface Tag {

Map<String, String> getAttributes();

boolean isGenerated();

boolean isOmitted();

interface Start extends Tag {
Expand Down
Loading