diff --git a/tf_2.x/lab-12-7-bonus-seq-to-seq-with-attention-chatbot-keras-eager.ipynb b/tf_2.x/lab-12-7-bonus-seq-to-seq-with-attention-chatbot-keras-eager.ipynb index 85036d5..4e9423b 100644 --- a/tf_2.x/lab-12-7-bonus-seq-to-seq-with-attention-chatbot-keras-eager.ipynb +++ b/tf_2.x/lab-12-7-bonus-seq-to-seq-with-attention-chatbot-keras-eager.ipynb @@ -346,34 +346,35 @@ "source": [ "## Write the encoder and decoder model\n", "\n", - "Here, we'll implement an encoder-decoder model with attention which you can read about in the TensorFlow [Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq). This example uses a more recent set of APIs. This notebook implements the [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism) from the seq2seq tutorial. The following diagram shows that each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence.\n", + "Here, we'll implement an encoder-decoder model with attention which you can read about in the TensorFlow [Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq). This example uses a more recent set of APIs. This notebook implements the [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism) from the seq2seq tutorial. The following diagram shows that each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence.(**[Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq)의 Encoder-decoder Model with Attention에 대해 알아봅시다. 이 예시는 최신의 API를 사용합니다. 이 notebook 파일은 seq2seq 튜토리얼의 [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism)을 실행합니다. 하단의 이미지는 Attention 기법에 따라 각각의 input words에 weight를 부여한 후, decoder를 이용하여 그 다음의 word를 예측하는 프로세스를 설명합니다.**)\n", "\n", "\"attention\n", "\n", "The input is put through an encoder model which gives us the encoder output of shape *(batch_size, max_length, hidden_size)* and the encoder hidden state of shape *(batch_size, hidden_size)*. \n", + "(**Input을 encoder model에 넣어 encoder output과 encoder hidden state를 반환합니다. 여기서 encoder output는 shape(batch_size, max_lenght, hidden_size)이며, encoder hidden state는 shape(batch_size, hidden_size)입니다.**)\n", "\n", - "Here are the equations that are implemented:\n", + "Here are the equations that are implemented(**실행 순서를 나타내는 수식은 다음과 같습니다.**):\n", "\n", "\"attention\n", "\"attention\n", "\n", - "We're using *Bahdanau attention*. Lets decide on notation before writing the simplified form:\n", + "We're using *Bahdanau attention*. Lets decide on notation before writing the simplified form(**우리는 Bahdanau Attention를 이용할 것입니다. 간단하게 나타내기 전에 몇 가지 표기를 정합시다.**):\n", "\n", "* FC = Fully connected (dense) layer\n", "* EO = Encoder output\n", "* H = hidden state\n", "* X = input to the decoder\n", "\n", - "And the pseudo-code:\n", + "And the pseudo-code(**수도코드(pseudo-code)는 다음과 같습니다.**):\n", "\n", "* `score = FC(tanh(FC(EO) + FC(H)))`\n", "* `attention weights = softmax(score, axis = 1)`. Softmax by default is applied on the last axis but here we want to apply it on the *1st axis*, since the shape of score is *(batch_size, max_length, 1)*. `Max_length` is the length of our input. Since we are trying to assign a weight to each input, softmax should be applied on that axis.\n", "* `context vector = sum(attention weights * EO, axis = 1)`. Same reason as above for choosing axis as 1.\n", "* `embedding output` = The input to the decoder X is passed through an embedding layer.\n", "* `merged vector = concat(embedding output, context vector)`\n", - "* This merged vector is then given to the GRU\n", - " \n", - "The shapes of all the vectors at each step have been specified in the comments in the code:" + "* This merged vector is then given to the GRU(**합쳐진 vector는 GRU에 사용됩니다.**)\n", + "\n", + "The shapes of all the vectors at each step have been specified in the comments in the code(**각 스텝마다 모든 vector의 shape는 코드 주석으로 표기했습니다.**):" ] }, { @@ -587,13 +588,13 @@ "source": [ "## Training\n", "\n", - "1. Pass the *input* through the *encoder* which return *encoder output* and the *encoder hidden state*.\n", - "2. The encoder output, encoder hidden state and the decoder input (which is the *start token*) is passed to the decoder.\n", - "3. The decoder returns the *predictions* and the *decoder hidden state*.\n", - "4. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.\n", - "5. Use *teacher forcing* to decide the next input to the decoder.\n", - "6. *Teacher forcing* is the technique where the *target word* is passed as the *next input* to the decoder.\n", - "7. The final step is to calculate the gradients and apply it to the optimizer and backpropagate." + "1. Pass the *input* through the *encoder* which return *encoder output* and the *encoder hidden state*.(**Input을 Encoder에 넣어 Encoder Output과 Encdoer Hidden State를 반환합니다**)\n", + "2. The encoder output, encoder hidden state and the decoder input (which is the *start token*) is passed to the decoder.(**Encoder Output과 Encoder Hidden State, Decoder Input(=Start token)을 Decoder에 넣습니다.**)\n", + "3. The decoder returns the *predictions* and the *decoder hidden state*.(**Decoder는 Predictions(예측 결과)와 Decoder Hidden State를 반환합니다**)\n", + "4. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.(**Decoder hidden state를 Model에 다시 넣고, predictions을 이용하여 loss 값을 계산합니다.**)\n", + "5. Use *teacher forcing* to decide the next input to the decoder.(**Teacher Forcing을 사용하여 decoder에 넣을 그 다음 input을 결정합니다.**)\n", + "6. *Teacher forcing* is the technique where the *target word* is passed as the *next input* to the decoder.(**Teacher Forcing 기술로 target word를 decoder의 다음 input으로 이용합니다**)\n", + "7. The final step is to calculate the gradients and apply it to the optimizer and backpropagate.(**마지막으로, gradients를 계산하고 optimizer와 역전파(backpropagate)에 적용합니다.**)" ] }, { @@ -974,11 +975,11 @@ "id": "mU3Ce8M6I3rz" }, "source": [ - "* The evaluate function is similar to the training loop, except we don't use *teacher forcing* here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.\n", - "* Stop predicting when the model predicts the *end token*.\n", - "* And store the *attention weights for every time step*.\n", + "* The evaluate function is similar to the training loop, except we don't use *teacher forcing* here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.(**Evaluate 함수는 teacher forcing이 없는 training 루프문과 비슷합니다. 매 step의 Decoder의 Input은 hidden state와 encoder output, 이전 step의 예측값입니다.**)\n", + "* Stop predicting when the model predicts the *end token*.(**모델의 마지막 token을 예측하면 predicting을 멈춥니다.**)\n", + "* And store the *attention weights for every time step*.(**매 time step마다 attention weights를 저장합니다.**)\n", "\n", - "Note: The encoder output is calculated only once for one input." + "Note: The encoder output is calculated only once for one input.(**참고 : Encoder Output은 한 개의 Input에서 단 한번만 계산됩니다.**)" ] }, { @@ -1306,7 +1307,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.6" + "version": "3.6.2" } }, "nbformat": 4,