From df4741c2bfce13507cd535aa4c22232277175b5f Mon Sep 17 00:00:00 2001 From: Bob Simonoff Date: Fri, 15 Sep 2023 08:30:59 -0400 Subject: [PATCH] Update 2023-09-13-text-embedding-and-cosine-similarity.markdown Fixed the pooling sentences --- _posts/2023-09-13-text-embedding-and-cosine-similarity.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2023-09-13-text-embedding-and-cosine-similarity.markdown b/_posts/2023-09-13-text-embedding-and-cosine-similarity.markdown index 8941aba..668beed 100644 --- a/_posts/2023-09-13-text-embedding-and-cosine-similarity.markdown +++ b/_posts/2023-09-13-text-embedding-and-cosine-similarity.markdown @@ -30,7 +30,7 @@ The following is a two-dimensional graph showing sample embedding vectors for ca
Example of Car, Cat and Dog embedding vectors and the cosine similarity between cat and dog as well as between cat and car.
-While cosine similarity has a range from -1.0 to 1.0, users of the OpenAI embedding API will typically not see values greater than 0.4. This is a side effect of max pooling, which is a technique often used with neural networks to reduce long input, such as text, down to the highlights, allowing the network to focus on the important parts of the data. In this case, it's an efficient compression technique for processing natural language in the embedding algorithm. A thorough explanation max pooling and the reason for the shift toward the positive is beyond the scope of the article. +While cosine similarity has a range from -1.0 to 1.0, users of the OpenAI embedding API will typically not see values less than 0.4. A thorough explanation of the reasons behind this are beyond the scope of this article, but you can learn more by searching for articles about text embedding pooling. ## Obtaining Embeddings and Cosine Similarity