Skip to content

Commit

Permalink
Merge pull request #66 from JDASoftwareGroup/Bobsimonoff-patch-4
Browse files Browse the repository at this point in the history
Update 2023-09-13-text-embedding-and-cosine-similarity.markdown
  • Loading branch information
sebastian-neubauer-by authored Sep 15, 2023
2 parents a8235b2 + df4741c commit dfed091
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The following is a two-dimensional graph showing sample embedding vectors for ca
<figcaption>Example of Car, Cat and Dog embedding vectors and the cosine similarity between cat and dog as well as between cat and car.</figcaption>
</figure>

While cosine similarity has a range from -1.0 to 1.0, users of the OpenAI embedding API will typically not see values greater than 0.4. This is a side effect of max pooling, which is a technique often used with neural networks to reduce long input, such as text, down to the highlights, allowing the network to focus on the important parts of the data. In this case, it's an efficient compression technique for processing natural language in the embedding algorithm. A thorough explanation max pooling and the reason for the shift toward the positive is beyond the scope of the article.
While cosine similarity has a range from -1.0 to 1.0, users of the OpenAI embedding API will typically not see values less than 0.4. A thorough explanation of the reasons behind this are beyond the scope of this article, but you can learn more by searching for articles about text embedding pooling.

## Obtaining Embeddings and Cosine Similarity

Expand Down

0 comments on commit dfed091

Please sign in to comment.