From 0e0c8cc24102ea439c604b2cad7b06c43859fa41 Mon Sep 17 00:00:00 2001 From: nishihatapalmer Date: Thu, 25 Feb 2016 22:32:29 +0000 Subject: [PATCH 1/7] Update README.md --- README.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/README.md b/README.md index ed056ac4..adb4a22d 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,7 @@ byteseek is a Java library for efficiently matching patterns of bytes and search * sequence - matchers for sequences of bytes, byte matchers, fixed gaps and sequences of sequences. ####Searcher +All the search algorithms have been extended to work with sequences which can match more than one byte at a given position. Any sequence search algorithm can work with any sequence matcher, no matter how it is composed. All the search implementations are stream-friendly - the length of an input source is not required unless you explicitly want to work at the end of an input source. * bytes - a naive searcher for byte matchers. * sequence - various implementations of the naive search, Boyer-Moore-Horspool, Signed Horspool and Sunday QuickSearch algorithms. @@ -19,9 +20,6 @@ byteseek is a Java library for efficiently matching patterns of bytes and search ####Compiler * matchers - compilers from the byteseek abstract syntax tree to byte matchers and sequence matchers. -All the provided match and search implementations are stream-friendly - the length of an input source is not required unless you explicitly want to work at the end of an input source. All the search algorithms have been extended to work with sequences which can match more than one byte at a given position. Any sequence search algorithm can work with any sequence matcher, no matter how it is composed. - - ##Untested Various other packages exist which are not currently tested, but will become so eventually. These include: From a731696f70054cd7e7e7a5ac1f2dbd2b65d74563 Mon Sep 17 00:00:00 2001 From: nishihatapalmer Date: Thu, 25 Feb 2016 22:37:05 +0000 Subject: [PATCH 2/7] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index adb4a22d..63c37e51 100644 --- a/README.md +++ b/README.md @@ -7,14 +7,18 @@ byteseek is a Java library for efficiently matching patterns of bytes and search ####Searcher All the search algorithms have been extended to work with sequences which can match more than one byte at a given position. Any sequence search algorithm can work with any sequence matcher, no matter how it is composed. All the search implementations are stream-friendly - the length of an input source is not required unless you explicitly want to work at the end of an input source. + * bytes - a naive searcher for byte matchers. * sequence - various implementations of the naive search, Boyer-Moore-Horspool, Signed Horspool and Sunday QuickSearch algorithms. ####IO +Matchers and searchers can all work over byte arrays directly. To work across other input sources requires the use of WindowReaders. These read from the underlying input source, caching the byte arrays directly to allow for efficient matching and searching across them multiple times. + * reader - readers for files, input streams, strings and byte arrays, and an adaptor from any reader back to an inputstream. Readers cache the byte arrays read from the input sources using flexible caching strategies. * reader/cache - pluggable caching strategies for readers, including least recently added, least recently used, temporary file caches, two level caches, double caches and others. ####Parser +A byte-oriented regular expression language is given to allow the easy construction of byte matchers, sequence matchers, and (eventually) finite state automata. An abstract syntax tree is defined, so other regular expression syntaxes could be used if required. * regex - a parser for a byte-oriented regular expression language, which produces a byteseek abstract syntax tree. ####Compiler From 254aea84f4da887019035fb29ecaf6e72aaed67e Mon Sep 17 00:00:00 2001 From: nishihatapalmer Date: Thu, 25 Feb 2016 22:37:40 +0000 Subject: [PATCH 3/7] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 63c37e51..acf436ca 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,7 @@ byteseek is a Java library for efficiently matching patterns of bytes and searching for those patterns. The main well-tested packages are: ####Matcher +A package which contains various types of matcher for individual bytes or sequences of them. * bytes - matchers (and inverted matchers) for bytes, ranges of bytes, sets, any byte, and bitmasks. * sequence - matchers for sequences of bytes, byte matchers, fixed gaps and sequences of sequences. From e7c3b29bef3528d5c497b2e88737f6151637c5e7 Mon Sep 17 00:00:00 2001 From: nishihatapalmer Date: Thu, 25 Feb 2016 22:39:35 +0000 Subject: [PATCH 4/7] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index acf436ca..0fd298f1 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ A package which contains various types of matcher for individual bytes or sequen * sequence - matchers for sequences of bytes, byte matchers, fixed gaps and sequences of sequences. ####Searcher -All the search algorithms have been extended to work with sequences which can match more than one byte at a given position. Any sequence search algorithm can work with any sequence matcher, no matter how it is composed. All the search implementations are stream-friendly - the length of an input source is not required unless you explicitly want to work at the end of an input source. +A package which contains implementations of various search algorithms. Most of them are sub-linear, which means they don't have to examine very position in an input source to find all possible matches. All the search algorithms have been extended to work with sequences which can match more than one byte at a given position. Any sequence search algorithm can work with any sequence matcher, no matter how it is composed. All the search implementations are stream-friendly - the length of an input source is not required unless you explicitly want to work at the end of an input source. * bytes - a naive searcher for byte matchers. * sequence - various implementations of the naive search, Boyer-Moore-Horspool, Signed Horspool and Sunday QuickSearch algorithms. @@ -23,6 +23,7 @@ A byte-oriented regular expression language is given to allow the easy construct * regex - a parser for a byte-oriented regular expression language, which produces a byteseek abstract syntax tree. ####Compiler +A package which contains compilers for all of the matchers from an abstract syntax tree. * matchers - compilers from the byteseek abstract syntax tree to byte matchers and sequence matchers. ##Untested From fca3a63d8e02aa82ede0ed3ead9eb01e8d4170b0 Mon Sep 17 00:00:00 2001 From: nishihatapalmer Date: Thu, 25 Feb 2016 22:39:57 +0000 Subject: [PATCH 5/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0fd298f1..04adb4ef 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ A package which contains various types of matcher for individual bytes or sequen * sequence - matchers for sequences of bytes, byte matchers, fixed gaps and sequences of sequences. ####Searcher -A package which contains implementations of various search algorithms. Most of them are sub-linear, which means they don't have to examine very position in an input source to find all possible matches. All the search algorithms have been extended to work with sequences which can match more than one byte at a given position. Any sequence search algorithm can work with any sequence matcher, no matter how it is composed. All the search implementations are stream-friendly - the length of an input source is not required unless you explicitly want to work at the end of an input source. +A package which contains implementations of various search algorithms. Most of them are sub-linear, which means they don't have to examine every position in an input source to find all possible matches. All the search algorithms have been extended to work with sequences which can match more than one byte at a given position. Any sequence search algorithm can work with any sequence matcher, no matter how it is composed. All the search implementations are stream-friendly - the length of an input source is not required unless you explicitly want to work at the end of an input source. * bytes - a naive searcher for byte matchers. * sequence - various implementations of the naive search, Boyer-Moore-Horspool, Signed Horspool and Sunday QuickSearch algorithms. From aff6625d29f65c267b9c26e219354249fafbb985 Mon Sep 17 00:00:00 2001 From: nishihatapalmer Date: Thu, 25 Feb 2016 22:45:48 +0000 Subject: [PATCH 6/7] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 04adb4ef..646d7ebc 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ A package which contains various types of matcher for individual bytes or sequen A package which contains implementations of various search algorithms. Most of them are sub-linear, which means they don't have to examine every position in an input source to find all possible matches. All the search algorithms have been extended to work with sequences which can match more than one byte at a given position. Any sequence search algorithm can work with any sequence matcher, no matter how it is composed. All the search implementations are stream-friendly - the length of an input source is not required unless you explicitly want to work at the end of an input source. * bytes - a naive searcher for byte matchers. +* matcher - a naive searcher for any matcher. * sequence - various implementations of the naive search, Boyer-Moore-Horspool, Signed Horspool and Sunday QuickSearch algorithms. ####IO @@ -31,6 +32,7 @@ Various other packages exist which are not currently tested, but will become so ####Matcher * multisequence - algorithms for multi-sequence matching, including lists and trie structures. +* automata - matchers for non deterministic and deterministic automata. ####Searcher * multisequence - implementations of Set Horspool, Signed Set Horspool, Wu-Manber and Signed Wu-Manber algorithms. From 3a92665c7a588e540a4b6d38f9c19a66505f0215 Mon Sep 17 00:00:00 2001 From: nishihatapalmer Date: Sat, 27 Feb 2016 19:40:12 +0000 Subject: [PATCH 7/7] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 646d7ebc..e8de653f 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,8 @@ A package which contains implementations of various search algorithms. Most of * sequence - various implementations of the naive search, Boyer-Moore-Horspool, Signed Horspool and Sunday QuickSearch algorithms. ####IO -Matchers and searchers can all work over byte arrays directly. To work across other input sources requires the use of WindowReaders. These read from the underlying input source, caching the byte arrays directly to allow for efficient matching and searching across them multiple times. +Matchers and searchers can all work over byte arrays directly. In order to read efficiently from any other input source, +readers provide a consistent random-access interface over files, input streams, strings and byte arrays. Pluggable caching strategies allow tailoring the memory and performance for different use cases. * reader - readers for files, input streams, strings and byte arrays, and an adaptor from any reader back to an inputstream. Readers cache the byte arrays read from the input sources using flexible caching strategies. * reader/cache - pluggable caching strategies for readers, including least recently added, least recently used, temporary file caches, two level caches, double caches and others.