golucene

A Go port of Apache Lucene.

Continuing where balzaczyy left off and tries to implement all of v4.10.4 with additional experimentation on different ranking models.

This is primarily for my personal use case only, DO NOT USE IN PRODUCTION (I just needed the search capabilities of Lucene in native Go and some alternatives found elsewhere just doesn't cut it for me).

The example has also been changed to showcase the changes added. (new fields aside from text and custom field type that shows storing of term vectors.)

DONE:

Models:

(take note that some siilarity models aren't actually available in Lucene)

BM25
AtireBM25
BM25L
BM25+
ModBM25
DFI (Divergence from Independence) models
DFR (Divergence from Randomness) models
IB (Information-Based) models
LM models
- Dirichlet
- Jelinek
- Hiemstra
- PitmanYorProcess
- TwoStage
- XSqrAM

Indexing:

index and store TermVectors.

FieldTypes:

LongField.
IntField.
DoubleField.
FloatField.

Query:

PrefixQuery
PhraseQuery
MultiPhraseQuery
MatchAllDocsQuery
ConstantScoreQuery

QueryExpansion:

Rocchio
RM

TO-DO:

****Query.java (still too many to mention)
More QE. (experiments on differential evolution and genetic algorithm - super expensive)
Finish some more unimplemented bits found here and there.
ClassicQueryParser and SimpleParser (still needs FuzzyQuery to fully close down)

Why do we need yet another port of Lucene?

Since Lucene Java is already optimized to teeth (and yes, I know it very much), potential performance gain should not be expected from its Go port. Quote from Lucy's FAQ:

Is Lucy faster than Lucene? It's written in C, after all.

That depends. As of this writing, Lucy launches faster than Lucene thanks to tighter integration with the system IO cache, but Lucene is faster in terms of raw indexing and search throughput once it gets going. These differences reflect the distinct priorities of the most active developers within the Lucy and Lucene communities more than anything else.

It also applies to GoLucene. But some benefits can still be expected:

quick start speed;
able to be embedded in Go app;
goroutine which I think can be faster in certain case;
ready-to-use byte, array utilities which can reduce the code size, and lead to easy maintenance.

Though it started as a pet project, I've been pretty serious about this.

Dependencies

Go 1.2+

Installation

go get -u github.com/jtejido/golucene

Usage

A detailed example can be found here.

License

Apache Public License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

golucene

DONE:

Models:

Indexing:

FieldTypes:

Query:

QueryExpansion:

TO-DO:

Why do we need yet another port of Lucene?

Dependencies

Installation

Usage

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

golucene

DONE:

Models:

Indexing:

FieldTypes:

Query:

QueryExpansion:

TO-DO:

Why do we need yet another port of Lucene?

Dependencies

Installation

Usage

License