Java implementation for MinHash and LSH for finding near duplicate documents as measured by Jaccard similarity.
Implementation of MinHash for approximating Jaccard similarity in text documents.
Also includes an implementation of LSH which is a fast way to find approximate nearest neighbors.