Text Comparer

Uses cosine similarity to give a numerical evaluation of the similarity of two texts (0 to 1).

This code has a companion blog-post here: http://engineering.aweber.com/cosine-similarity/

Sample Usage

In [1]: from vectorizer import compare_texts

In [2]: compare_texts('Two identical sentences', 'Two identical sentences')
Out[2]: 1.0

In [3]: compare_texts('Two similar sentences', 'Two non-identical sentences')
Out[3]: 0.6666666666666666

In [4]: compare_texts('Two radically different sentences',
                      'This statement shares no words with the previous one')
Out[4]: 0.0

The higher the output of compare_texts, the higher the percentage of shared words between sentences. That description is a simplification of the actual algorithm, but it's pretty close to truth.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
tests		tests
text_comparer		text_comparer
.gitignore		.gitignore
Makefile		Makefile
Manifest.in		Manifest.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Comparer

Sample Usage

About

Releases

Packages

Languages

sergeio/text_comparer

Folders and files

Latest commit

History

Repository files navigation

Text Comparer

Sample Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages