Skip to content

gnames/aho_corasick

Repository files navigation

aho_corasick

A Go implementation of Aho-Corasick algorithm for efficient multiple pattern matching within a string.

Introduction

This project implements the powerful string searching Aho-Corasick algorithm invented by Alfred V. Aho and Margaret J. Corasick in the Go programming language. The Aho-Corasick algorithm is useful because it efficiently indexes all occurrences of a list of keywords within a text string.

This implementation searches at the letter level instead of the word level. Both failure links and dictionary links are implemented.

Installation

The Go module is installable by running:

go get github.com/gnames/aho_corasick

Usage

ereate a new aho_corasick instance with aho_corasick.New() and setup the automaton with the search patterns with ac.Setup(patterns). Run search with ac.Setup(patterns), which returns an array of matches.

ac := aho_corasick.New()
patterns := []string{"aba", "cla", "ac", "gee", "lan"}
ac.Setup(patterns)
haystack := "abacgeeaba"
matches := ac.Search(haystack)

Development

If you find a bug, please open an issue ticket. Pull requests are welcome.

Tests can be run with go test which will produce a text visual of the trie:

******* Trie *******

haystack: abacgeeaba

root->root ┬─ a->root ┬─ b->root ── a*->a
           │          └─ c*->c
           ├─ c->root ── l->l ── a*->a
           ├─ g->root ── e->root ── e*->root
           └─ l->root ── a->a ── n*->root

********************
PASS

In the trie output, root refers to the root node, * represents word nodes, -> indicates the failure links, | indicates dictionary links.

Trie output can also be produced with the debugger, which can be run with:

ac := aho_corasick.New()
haystack := "geeabaclaba"
patterns := []string{"aba", "cla", "ac", "gee", "lan"}
ac.Setup(patterns)
ac.Debug(haystack)

License

License: MIT

Authors

About

Implementation of Aho-Corasick algorithm

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages