Skip to content

Uses cosine similarity to evaluate the distance between two texts (0 to 1).

Notifications You must be signed in to change notification settings

sergeio/text_comparer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Comparer

Uses cosine similarity to give a numerical evaluation of the similarity of two texts (0 to 1).

This code has a companion blog-post here: http://engineering.aweber.com/cosine-similarity/

Sample Usage

In [1]: from vectorizer import compare_texts

In [2]: compare_texts('Two identical sentences', 'Two identical sentences')
Out[2]: 1.0

In [3]: compare_texts('Two similar sentences', 'Two non-identical sentences')
Out[3]: 0.6666666666666666

In [4]: compare_texts('Two radically different sentences',
                      'This statement shares no words with the previous one')
Out[4]: 0.0

The higher the output of compare_texts, the higher the percentage of shared words between sentences. That description is a simplification of the actual algorithm, but it's pretty close to truth.

About

Uses cosine similarity to evaluate the distance between two texts (0 to 1).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published