Skip to content

Commit

Permalink
Fix notebook failure with Keras 3.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 615189902
  • Loading branch information
MarkDaoust authored and tf-text-github-robot committed Mar 12, 2024
1 parent 8d6b5e0 commit 9aaf8ac
Showing 1 changed file with 13 additions and 25 deletions.
38 changes: 13 additions & 25 deletions docs/tutorials/word2vec.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@
"id": "xP00WlaMWBZC"
},
"source": [
"## Skip-gram and negative sampling "
"## Skip-gram and negative sampling"
]
},
{
Expand All @@ -95,7 +95,7 @@
"id": "Zr2wjv0bW236"
},
"source": [
"While a bag-of-words model predicts a word given the neighboring context, a skip-gram model predicts the context (or neighbors) of a word, given the word itself. The model is trained on skip-grams, which are n-grams that allow tokens to be skipped (see the diagram below for an example). The context of a word can be represented through a set of skip-gram pairs of `(target_word, context_word)` where `context_word` appears in the neighboring context of `target_word`. "
"While a bag-of-words model predicts a word given the neighboring context, a skip-gram model predicts the context (or neighbors) of a word, given the word itself. The model is trained on skip-grams, which are n-grams that allow tokens to be skipped (see the diagram below for an example). The context of a word can be represented through a set of skip-gram pairs of `(target_word, context_word)` where `context_word` appears in the neighboring context of `target_word`."
]
},
{
Expand Down Expand Up @@ -189,7 +189,7 @@
"id": "Y5VWYtmFzHkU"
},
"source": [
"The [noise contrastive estimation](https://www.tensorflow.org/api_docs/python/tf/nn/nce_loss) (NCE) loss function is an efficient approximation for a full softmax. With an objective to learn word embeddings instead of modeling the word distribution, the NCE loss can be [simplified](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) to use negative sampling. "
"The [noise contrastive estimation](https://www.tensorflow.org/api_docs/python/tf/nn/nce_loss) (NCE) loss function is an efficient approximation for a full softmax. With an objective to learn word embeddings instead of modeling the word distribution, the NCE loss can be [simplified](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) to use negative sampling."
]
},
{
Expand Down Expand Up @@ -250,9 +250,7 @@
"import numpy as np\n",
"\n",
"import tensorflow as tf\n",
"from tensorflow.keras import layers\n",
"\n",
"from collections import defaultdict"
"from tensorflow.keras import layers"
]
},
{
Expand Down Expand Up @@ -447,7 +445,7 @@
"id": "_ua9PkMTISF0"
},
"source": [
"### Negative sampling for one skip-gram "
"### Negative sampling for one skip-gram"
]
},
{
Expand All @@ -456,7 +454,7 @@
"id": "Esqn8WBfZnEK"
},
"source": [
"The `skipgrams` function returns all positive skip-gram pairs by sliding over a given window span. To produce additional skip-gram pairs that would serve as negative samples for training, you can sample random words from the vocabulary. Use the `tf.random.log_uniform_candidate_sampler` function to sample `num_ns` number of negative samples for a given target word in a window. You can pass words from the positive class but this does not exclude them from the results. For large vocabularies, this is not a problem because the chance of drawing one of the positive classes is small. However for small data you may see overlap between negative and positive samples. Later we will add code to exclude positive samples for slightly improved accuracy at the cost of longer runtime."
"The `skipgrams` function returns all positive skip-gram pairs by sliding over a given window span. To produce additional skip-gram pairs that would serve as negative samples for training, you need to sample random words from the vocabulary. Use the `tf.random.log_uniform_candidate_sampler` function to sample `num_ns` number of negative samples for a given target word in a window. You can call the function on one skip-grams's target word and pass the context word as true class to exclude it from being sampled.\n"
]
},
{
Expand Down Expand Up @@ -630,7 +628,7 @@
"id": "iLKwNAczHsKg"
},
"source": [
"### Skip-gram sampling table "
"### Skip-gram sampling table"
]
},
{
Expand All @@ -639,7 +637,7 @@
"id": "TUUK3uDtFNFE"
},
"source": [
"A large dataset means larger vocabulary with higher number of more frequent words such as stopwords. Training examples obtained from sampling commonly occurring words (such as `the`, `is`, `on`) don't add much useful information for the model to learn from. [Mikolov et al.](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) suggest subsampling of frequent words as a helpful practice to improve embedding quality. "
"A large dataset means larger vocabulary with higher number of more frequent words such as stopwords. Training examples obtained from sampling commonly occurring words (such as `the`, `is`, `on`) don't add much useful information for the model to learn from. [Mikolov et al.](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) suggest subsampling of frequent words as a helpful practice to improve embedding quality."
]
},
{
Expand Down Expand Up @@ -730,10 +728,6 @@
"\n",
" # Iterate over each positive skip-gram pair to produce training examples\n",
" # with a positive context word and negative samples.\n",
" window = defaultdict(set)\n",
" for target_word, context_word in positive_skip_grams:\n",
" window[target_word].add(context_word)\n",
"\n",
" for target_word, context_word in positive_skip_grams:\n",
" context_class = tf.expand_dims(\n",
" tf.constant([context_word], dtype=\"int64\"), 1)\n",
Expand All @@ -746,11 +740,7 @@
" seed=seed,\n",
" name=\"negative_sampling\")\n",
"\n",
" # Discard this negative sample if it intersects with the positive context.\n",
" if window[target_word].intersection(negative_sampling_candidates.numpy()):\n",
" continue\n",
"\n",
" # Build context and label vectors (for one target word).\n",
" # Build context and label vectors (for one target word)\n",
" context = tf.concat([tf.squeeze(context_class,1), negative_sampling_candidates], 0)\n",
" label = tf.constant([1] + [0]*num_ns, dtype=\"int64\")\n",
"\n",
Expand Down Expand Up @@ -815,7 +805,7 @@
"id": "sOsbLq8a37dr"
},
"source": [
"Read the text from the file and print the first few lines: "
"Read the text from the file and print the first few lines:"
]
},
{
Expand Down Expand Up @@ -1178,11 +1168,9 @@
" super(Word2Vec, self).__init__()\n",
" self.target_embedding = layers.Embedding(vocab_size,\n",
" embedding_dim,\n",
" input_length=1,\n",
" name=\"w2v_embedding\")\n",
" self.context_embedding = layers.Embedding(vocab_size,\n",
" embedding_dim,\n",
" input_length=num_ns+1)\n",
" embedding_dim)\n",
"\n",
" def call(self, pair):\n",
" target, context = pair\n",
Expand Down Expand Up @@ -1222,7 +1210,7 @@
" return tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=y_true)\n",
"```\n",
"\n",
"It's time to build your model! Instantiate your word2vec class with an embedding dimension of 128 (you could experiment with different values). Compile the model with the `tf.keras.optimizers.Adam` optimizer. "
"It's time to build your model! Instantiate your word2vec class with an embedding dimension of 128 (you could experiment with different values). Compile the model with the `tf.keras.optimizers.Adam` optimizer."
]
},
{
Expand Down Expand Up @@ -1424,8 +1412,8 @@
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "word2vec.ipynb",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
Expand Down

0 comments on commit 9aaf8ac

Please sign in to comment.