Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
nishihatapalmer committed Feb 27, 2016
2 parents 2b84169 + 3a92665 commit b110b95
Showing 1 changed file with 10 additions and 3 deletions.
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,38 @@
byteseek is a Java library for efficiently matching patterns of bytes and searching for those patterns. The main well-tested packages are:

####Matcher
A package which contains various types of matcher for individual bytes or sequences of them.
* bytes - matchers (and inverted matchers) for bytes, ranges of bytes, sets, any byte, and bitmasks.
* sequence - matchers for sequences of bytes, byte matchers, fixed gaps and sequences of sequences.

####Searcher
A package which contains implementations of various search algorithms. Most of them are sub-linear, which means they don't have to examine every position in an input source to find all possible matches. All the search algorithms have been extended to work with sequences which can match more than one byte at a given position. Any sequence search algorithm can work with any sequence matcher, no matter how it is composed. All the search implementations are stream-friendly - the length of an input source is not required unless you explicitly want to work at the end of an input source.

* bytes - a naive searcher for byte matchers.
* matcher - a naive searcher for any matcher.
* sequence - various implementations of the naive search, Boyer-Moore-Horspool, Signed Horspool and Sunday QuickSearch algorithms.

####IO
Matchers and searchers can all work over byte arrays directly. In order to read efficiently from any other input source,
readers provide a consistent random-access interface over files, input streams, strings and byte arrays. Pluggable caching strategies allow tailoring the memory and performance for different use cases.

* reader - readers for files, input streams, strings and byte arrays, and an adaptor from any reader back to an inputstream. Readers cache the byte arrays read from the input sources using flexible caching strategies.
* reader/cache - pluggable caching strategies for readers, including least recently added, least recently used, temporary file caches, two level caches, double caches and others.

####Parser
A byte-oriented regular expression language is given to allow the easy construction of byte matchers, sequence matchers, and (eventually) finite state automata. An abstract syntax tree is defined, so other regular expression syntaxes could be used if required.
* regex - a parser for a byte-oriented regular expression language, which produces a byteseek abstract syntax tree.

####Compiler
A package which contains compilers for all of the matchers from an abstract syntax tree.
* matchers - compilers from the byteseek abstract syntax tree to byte matchers and sequence matchers.

All the provided match and search implementations are stream-friendly - the length of an input source is not required unless you explicitly want to work at the end of an input source. All the search algorithms have been extended to work with sequences which can match more than one byte at a given position. Any sequence search algorithm can work with any sequence matcher, no matter how it is composed.


##Untested
Various other packages exist which are not currently tested, but will become so eventually. These include:

####Matcher
* multisequence - algorithms for multi-sequence matching, including lists and trie structures.
* automata - matchers for non deterministic and deterministic automata.

####Searcher
* multisequence - implementations of Set Horspool, Signed Set Horspool, Wu-Manber and Signed Wu-Manber algorithms.
Expand Down

0 comments on commit b110b95

Please sign in to comment.