deeplearningzerotoall · liza0525 · Sep 4, 2020
diff --git a/tf_2.x/lab-12-7-bonus-seq-to-seq-with-attention-chatbot-keras-eager.ipynb b/tf_2.x/lab-12-7-bonus-seq-to-seq-with-attention-chatbot-keras-eager.ipynb
@@ -346,34 +346,35 @@
    "source": [
     "## Write the encoder and decoder model\n",
     "\n",
-    "Here, we'll implement an encoder-decoder model with attention which you can read about in the TensorFlow [Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq). This example uses a more recent set of APIs. This notebook implements the [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism) from the seq2seq tutorial. The following diagram shows that each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence.\n",
+    "Here, we'll implement an encoder-decoder model with attention which you can read about in the TensorFlow [Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq). This example uses a more recent set of APIs. This notebook implements the [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism) from the seq2seq tutorial. The following diagram shows that each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence.(**[Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq)의 Encoder-decoder Model with Attention에 대해 알아봅시다. 이 예시는 최신의 API를 사용합니다. 이 notebook 파일은 seq2seq 튜토리얼의 [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism)을 실행합니다. 하단의 이미지는 Attention 기법에 따라 각각의 input words에 weight를 부여한 후, decoder를 이용하여 그 다음의 word를 예측하는 프로세스를 설명합니다.**)\n",
     "\n",
     "<img src=\"https://www.tensorflow.org/images/seq2seq/attention_mechanism.jpg\" width=\"500\" alt=\"attention mechanism\">\n",
     "\n",
     "The input is put through an encoder model which gives us the encoder output of shape *(batch_size, max_length, hidden_size)* and the encoder hidden state of shape *(batch_size, hidden_size)*. \n",
+    "(**Input을 encoder model에 넣어 encoder output과 encoder hidden state를 반환합니다. 여기서 encoder output는 shape(batch_size, max_lenght, hidden_size)이며, encoder hidden state는 shape(batch_size, hidden_size)입니다.**)\n",
     "\n",
-    "Here are the equations that are implemented:\n",
+    "Here are the equations that are implemented(**실행 순서를 나타내는 수식은 다음과 같습니다.**):\n",
     "\n",
     "<img src=\"https://www.tensorflow.org/images/seq2seq/attention_equation_0.jpg\" alt=\"attention equation 0\" width=\"800\">\n",
     "<img src=\"https://www.tensorflow.org/images/seq2seq/attention_equation_1.jpg\" alt=\"attention equation 1\" width=\"800\">\n",
     "\n",
-    "We're using *Bahdanau attention*. Lets decide on notation before writing the simplified form:\n",
+    "We're using *Bahdanau attention*. Lets decide on notation before writing the simplified form(**우리는 Bahdanau Attention를 이용할 것입니다. 간단하게 나타내기 전에 몇 가지 표기를 정합시다.**):\n",
     "\n",
     "* FC = Fully connected (dense) layer\n",
     "* EO = Encoder output\n",
     "* H = hidden state\n",
     "* X = input to the decoder\n",
     "\n",
-    "And the pseudo-code:\n",
+    "And the pseudo-code(**수도코드(pseudo-code)는 다음과 같습니다.**):\n",
     "\n",
     "* `score = FC(tanh(FC(EO) + FC(H)))`\n",
     "* `attention weights = softmax(score, axis = 1)`. Softmax by default is applied on the last axis but here we want to apply it on the *1st axis*, since the shape of score is *(batch_size, max_length, 1)*. `Max_length` is the length of our input. Since we are trying to assign a weight to each input, softmax should be applied on that axis.\n",
     "* `context vector = sum(attention weights * EO, axis = 1)`. Same reason as above for choosing axis as 1.\n",
     "* `embedding output` = The input to the decoder X is passed through an embedding layer.\n",
     "* `merged vector = concat(embedding output, context vector)`\n",
-    "* This merged vector is then given to the GRU\n",
-    "  \n",
-    "The shapes of all the vectors at each step have been specified in the comments in the code:"
+    "* This merged vector is then given to the GRU(**합쳐진 vector는 GRU에 사용됩니다.**)\n",
+    "\n",
+    "The shapes of all the vectors at each step have been specified in the comments in the code(**각 스텝마다 모든 vector의 shape는 코드 주석으로 표기했습니다.**):"
    ]
   },
   {
@@ -587,13 +588,13 @@
    "source": [
     "## Training\n",
     "\n",
-    "1. Pass the *input* through the *encoder* which return *encoder output* and the *encoder hidden state*.\n",
-    "2. The encoder output, encoder hidden state and the decoder input (which is the *start token*) is passed to the decoder.\n",
-    "3. The decoder returns the *predictions* and the *decoder hidden state*.\n",
-    "4. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.\n",
-    "5. Use *teacher forcing* to decide the next input to the decoder.\n",
-    "6. *Teacher forcing* is the technique where the *target word* is passed as the *next input* to the decoder.\n",
-    "7. The final step is to calculate the gradients and apply it to the optimizer and backpropagate."
+    "1. Pass the *input* through the *encoder* which return *encoder output* and the *encoder hidden state*.(**Input을 Encoder에 넣어 Encoder Output과 Encdoer Hidden State를 반환합니다**)\n",
+    "2. The encoder output, encoder hidden state and the decoder input (which is the *start token*) is passed to the decoder.(**Encoder Output과 Encoder Hidden State, Decoder Input(=Start token)을 Decoder에 넣습니다.**)\n",
+    "3. The decoder returns the *predictions* and the *decoder hidden state*.(**Decoder는 Predictions(예측 결과)와 Decoder Hidden State를 반환합니다**)\n",
+    "4. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.(**Decoder hidden state를 Model에 다시 넣고, predictions을 이용하여 loss 값을 계산합니다.**)\n",
+    "5. Use *teacher forcing* to decide the next input to the decoder.(**Teacher Forcing을 사용하여 decoder에 넣을 그 다음 input을 결정합니다.**)\n",
+    "6. *Teacher forcing* is the technique where the *target word* is passed as the *next input* to the decoder.(**Teacher Forcing 기술로 target word를 decoder의 다음 input으로 이용합니다**)\n",
+    "7. The final step is to calculate the gradients and apply it to the optimizer and backpropagate.(**마지막으로, gradients를 계산하고 optimizer와 역전파(backpropagate)에 적용합니다.**)"
    ]
   },
   {
@@ -974,11 +975,11 @@
     "id": "mU3Ce8M6I3rz"
    },
    "source": [
-    "* The evaluate function is similar to the training loop, except we don't use *teacher forcing* here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.\n",
-    "* Stop predicting when the model predicts the *end token*.\n",
-    "* And store the *attention weights for every time step*.\n",
+    "* The evaluate function is similar to the training loop, except we don't use *teacher forcing* here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.(**Evaluate 함수는 teacher forcing이 없는 training 루프문과 비슷합니다. 매 step의 Decoder의 Input은 hidden state와 encoder output, 이전 step의 예측값입니다.**)\n",
+    "* Stop predicting when the model predicts the *end token*.(**모델의 마지막 token을 예측하면 predicting을 멈춥니다.**)\n",
+    "* And store the *attention weights for every time step*.(**매 time step마다 attention weights를 저장합니다.**)\n",
     "\n",
-    "Note: The encoder output is calculated only once for one input."
+    "Note: The encoder output is calculated only once for one input.(**참고 : Encoder Output은 한 개의 Input에서 단 한번만 계산됩니다.**)"
    ]
   },
   {
@@ -1306,7 +1307,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.6"
+   "version": "3.6.2"
   }
  },
  "nbformat": 4,