diff --git a/tf_2.x/lab-12-7-bonus-seq-to-seq-with-attention-chatbot-keras-eager.ipynb b/tf_2.x/lab-12-7-bonus-seq-to-seq-with-attention-chatbot-keras-eager.ipynb
index 85036d5..4e9423b 100644
--- a/tf_2.x/lab-12-7-bonus-seq-to-seq-with-attention-chatbot-keras-eager.ipynb
+++ b/tf_2.x/lab-12-7-bonus-seq-to-seq-with-attention-chatbot-keras-eager.ipynb
@@ -346,34 +346,35 @@
"source": [
"## Write the encoder and decoder model\n",
"\n",
- "Here, we'll implement an encoder-decoder model with attention which you can read about in the TensorFlow [Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq). This example uses a more recent set of APIs. This notebook implements the [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism) from the seq2seq tutorial. The following diagram shows that each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence.\n",
+ "Here, we'll implement an encoder-decoder model with attention which you can read about in the TensorFlow [Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq). This example uses a more recent set of APIs. This notebook implements the [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism) from the seq2seq tutorial. The following diagram shows that each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence.(**[Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq)의 Encoder-decoder Model with Attention에 대해 알아봅시다. 이 예시는 최신의 API를 사용합니다. 이 notebook 파일은 seq2seq 튜토리얼의 [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism)을 실행합니다. 하단의 이미지는 Attention 기법에 따라 각각의 input words에 weight를 부여한 후, decoder를 이용하여 그 다음의 word를 예측하는 프로세스를 설명합니다.**)\n",
"\n",
"\n",
"\n",
"The input is put through an encoder model which gives us the encoder output of shape *(batch_size, max_length, hidden_size)* and the encoder hidden state of shape *(batch_size, hidden_size)*. \n",
+ "(**Input을 encoder model에 넣어 encoder output과 encoder hidden state를 반환합니다. 여기서 encoder output는 shape(batch_size, max_lenght, hidden_size)이며, encoder hidden state는 shape(batch_size, hidden_size)입니다.**)\n",
"\n",
- "Here are the equations that are implemented:\n",
+ "Here are the equations that are implemented(**실행 순서를 나타내는 수식은 다음과 같습니다.**):\n",
"\n",
"\n",
"\n",
"\n",
- "We're using *Bahdanau attention*. Lets decide on notation before writing the simplified form:\n",
+ "We're using *Bahdanau attention*. Lets decide on notation before writing the simplified form(**우리는 Bahdanau Attention를 이용할 것입니다. 간단하게 나타내기 전에 몇 가지 표기를 정합시다.**):\n",
"\n",
"* FC = Fully connected (dense) layer\n",
"* EO = Encoder output\n",
"* H = hidden state\n",
"* X = input to the decoder\n",
"\n",
- "And the pseudo-code:\n",
+ "And the pseudo-code(**수도코드(pseudo-code)는 다음과 같습니다.**):\n",
"\n",
"* `score = FC(tanh(FC(EO) + FC(H)))`\n",
"* `attention weights = softmax(score, axis = 1)`. Softmax by default is applied on the last axis but here we want to apply it on the *1st axis*, since the shape of score is *(batch_size, max_length, 1)*. `Max_length` is the length of our input. Since we are trying to assign a weight to each input, softmax should be applied on that axis.\n",
"* `context vector = sum(attention weights * EO, axis = 1)`. Same reason as above for choosing axis as 1.\n",
"* `embedding output` = The input to the decoder X is passed through an embedding layer.\n",
"* `merged vector = concat(embedding output, context vector)`\n",
- "* This merged vector is then given to the GRU\n",
- " \n",
- "The shapes of all the vectors at each step have been specified in the comments in the code:"
+ "* This merged vector is then given to the GRU(**합쳐진 vector는 GRU에 사용됩니다.**)\n",
+ "\n",
+ "The shapes of all the vectors at each step have been specified in the comments in the code(**각 스텝마다 모든 vector의 shape는 코드 주석으로 표기했습니다.**):"
]
},
{
@@ -587,13 +588,13 @@
"source": [
"## Training\n",
"\n",
- "1. Pass the *input* through the *encoder* which return *encoder output* and the *encoder hidden state*.\n",
- "2. The encoder output, encoder hidden state and the decoder input (which is the *start token*) is passed to the decoder.\n",
- "3. The decoder returns the *predictions* and the *decoder hidden state*.\n",
- "4. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.\n",
- "5. Use *teacher forcing* to decide the next input to the decoder.\n",
- "6. *Teacher forcing* is the technique where the *target word* is passed as the *next input* to the decoder.\n",
- "7. The final step is to calculate the gradients and apply it to the optimizer and backpropagate."
+ "1. Pass the *input* through the *encoder* which return *encoder output* and the *encoder hidden state*.(**Input을 Encoder에 넣어 Encoder Output과 Encdoer Hidden State를 반환합니다**)\n",
+ "2. The encoder output, encoder hidden state and the decoder input (which is the *start token*) is passed to the decoder.(**Encoder Output과 Encoder Hidden State, Decoder Input(=Start token)을 Decoder에 넣습니다.**)\n",
+ "3. The decoder returns the *predictions* and the *decoder hidden state*.(**Decoder는 Predictions(예측 결과)와 Decoder Hidden State를 반환합니다**)\n",
+ "4. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.(**Decoder hidden state를 Model에 다시 넣고, predictions을 이용하여 loss 값을 계산합니다.**)\n",
+ "5. Use *teacher forcing* to decide the next input to the decoder.(**Teacher Forcing을 사용하여 decoder에 넣을 그 다음 input을 결정합니다.**)\n",
+ "6. *Teacher forcing* is the technique where the *target word* is passed as the *next input* to the decoder.(**Teacher Forcing 기술로 target word를 decoder의 다음 input으로 이용합니다**)\n",
+ "7. The final step is to calculate the gradients and apply it to the optimizer and backpropagate.(**마지막으로, gradients를 계산하고 optimizer와 역전파(backpropagate)에 적용합니다.**)"
]
},
{
@@ -974,11 +975,11 @@
"id": "mU3Ce8M6I3rz"
},
"source": [
- "* The evaluate function is similar to the training loop, except we don't use *teacher forcing* here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.\n",
- "* Stop predicting when the model predicts the *end token*.\n",
- "* And store the *attention weights for every time step*.\n",
+ "* The evaluate function is similar to the training loop, except we don't use *teacher forcing* here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.(**Evaluate 함수는 teacher forcing이 없는 training 루프문과 비슷합니다. 매 step의 Decoder의 Input은 hidden state와 encoder output, 이전 step의 예측값입니다.**)\n",
+ "* Stop predicting when the model predicts the *end token*.(**모델의 마지막 token을 예측하면 predicting을 멈춥니다.**)\n",
+ "* And store the *attention weights for every time step*.(**매 time step마다 attention weights를 저장합니다.**)\n",
"\n",
- "Note: The encoder output is calculated only once for one input."
+ "Note: The encoder output is calculated only once for one input.(**참고 : Encoder Output은 한 개의 Input에서 단 한번만 계산됩니다.**)"
]
},
{
@@ -1306,7 +1307,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.6"
+ "version": "3.6.2"
}
},
"nbformat": 4,