Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate from English to Korean #32

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -346,34 +346,35 @@
"source": [
"## Write the encoder and decoder model\n",
"\n",
"Here, we'll implement an encoder-decoder model with attention which you can read about in the TensorFlow [Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq). This example uses a more recent set of APIs. This notebook implements the [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism) from the seq2seq tutorial. The following diagram shows that each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence.\n",
"Here, we'll implement an encoder-decoder model with attention which you can read about in the TensorFlow [Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq). This example uses a more recent set of APIs. This notebook implements the [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism) from the seq2seq tutorial. The following diagram shows that each input words is assigned a weight by the attention mechanism which is then used by the decoder to predict the next word in the sentence.(**[Neural Machine Translation (seq2seq) tutorial](https://www.tensorflow.org/tutorials/seq2seq)의 Encoder-decoder Model with Attention에 대해 알아봅시다. 이 예시는 최신의 API를 사용합니다. 이 notebook 파일은 seq2seq 튜토리얼의 [attention equations](https://www.tensorflow.org/tutorials/seq2seq#background_on_the_attention_mechanism)을 실행합니다. 하단의 이미지는 Attention 기법에 따라 각각의 input words에 weight를 부여한 후, decoder를 이용하여 그 다음의 word를 예측하는 프로세스를 설명합니다.**)\n",
"\n",
"<img src=\"https://www.tensorflow.org/images/seq2seq/attention_mechanism.jpg\" width=\"500\" alt=\"attention mechanism\">\n",
"\n",
"The input is put through an encoder model which gives us the encoder output of shape *(batch_size, max_length, hidden_size)* and the encoder hidden state of shape *(batch_size, hidden_size)*. \n",
"(**Input을 encoder model에 넣어 encoder output과 encoder hidden state를 반환합니다. 여기서 encoder output는 shape(batch_size, max_lenght, hidden_size)이며, encoder hidden state는 shape(batch_size, hidden_size)입니다.**)\n",
"\n",
"Here are the equations that are implemented:\n",
"Here are the equations that are implemented(**실행 순서를 나타내는 수식은 다음과 같습니다.**):\n",
"\n",
"<img src=\"https://www.tensorflow.org/images/seq2seq/attention_equation_0.jpg\" alt=\"attention equation 0\" width=\"800\">\n",
"<img src=\"https://www.tensorflow.org/images/seq2seq/attention_equation_1.jpg\" alt=\"attention equation 1\" width=\"800\">\n",
"\n",
"We're using *Bahdanau attention*. Lets decide on notation before writing the simplified form:\n",
"We're using *Bahdanau attention*. Lets decide on notation before writing the simplified form(**우리는 Bahdanau Attention를 이용할 것입니다. 간단하게 나타내기 전에 몇 가지 표기를 정합시다.**):\n",
"\n",
"* FC = Fully connected (dense) layer\n",
"* EO = Encoder output\n",
"* H = hidden state\n",
"* X = input to the decoder\n",
"\n",
"And the pseudo-code:\n",
"And the pseudo-code(**수도코드(pseudo-code)는 다음과 같습니다.**):\n",
"\n",
"* `score = FC(tanh(FC(EO) + FC(H)))`\n",
"* `attention weights = softmax(score, axis = 1)`. Softmax by default is applied on the last axis but here we want to apply it on the *1st axis*, since the shape of score is *(batch_size, max_length, 1)*. `Max_length` is the length of our input. Since we are trying to assign a weight to each input, softmax should be applied on that axis.\n",
"* `context vector = sum(attention weights * EO, axis = 1)`. Same reason as above for choosing axis as 1.\n",
"* `embedding output` = The input to the decoder X is passed through an embedding layer.\n",
"* `merged vector = concat(embedding output, context vector)`\n",
"* This merged vector is then given to the GRU\n",
" \n",
"The shapes of all the vectors at each step have been specified in the comments in the code:"
"* This merged vector is then given to the GRU(**합쳐진 vector는 GRU에 사용됩니다.**)\n",
"\n",
"The shapes of all the vectors at each step have been specified in the comments in the code(**각 스텝마다 모든 vector의 shape는 코드 주석으로 표기했습니다.**):"
]
},
{
Expand Down Expand Up @@ -587,13 +588,13 @@
"source": [
"## Training\n",
"\n",
"1. Pass the *input* through the *encoder* which return *encoder output* and the *encoder hidden state*.\n",
"2. The encoder output, encoder hidden state and the decoder input (which is the *start token*) is passed to the decoder.\n",
"3. The decoder returns the *predictions* and the *decoder hidden state*.\n",
"4. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.\n",
"5. Use *teacher forcing* to decide the next input to the decoder.\n",
"6. *Teacher forcing* is the technique where the *target word* is passed as the *next input* to the decoder.\n",
"7. The final step is to calculate the gradients and apply it to the optimizer and backpropagate."
"1. Pass the *input* through the *encoder* which return *encoder output* and the *encoder hidden state*.(**Input을 Encoder에 넣어 Encoder Output과 Encdoer Hidden State를 반환합니다**)\n",
"2. The encoder output, encoder hidden state and the decoder input (which is the *start token*) is passed to the decoder.(**Encoder Output과 Encoder Hidden State, Decoder Input(=Start token)을 Decoder에 넣습니다.**)\n",
"3. The decoder returns the *predictions* and the *decoder hidden state*.(**Decoder는 Predictions(예측 결과)와 Decoder Hidden State를 반환합니다**)\n",
"4. The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.(**Decoder hidden state를 Model에 다시 넣고, predictions을 이용하여 loss 값을 계산합니다.**)\n",
"5. Use *teacher forcing* to decide the next input to the decoder.(**Teacher Forcing을 사용하여 decoder에 넣을 그 다음 input을 결정합니다.**)\n",
"6. *Teacher forcing* is the technique where the *target word* is passed as the *next input* to the decoder.(**Teacher Forcing 기술로 target word를 decoder의 다음 input으로 이용합니다**)\n",
"7. The final step is to calculate the gradients and apply it to the optimizer and backpropagate.(**마지막으로, gradients를 계산하고 optimizer와 역전파(backpropagate)에 적용합니다.**)"
]
},
{
Expand Down Expand Up @@ -974,11 +975,11 @@
"id": "mU3Ce8M6I3rz"
},
"source": [
"* The evaluate function is similar to the training loop, except we don't use *teacher forcing* here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.\n",
"* Stop predicting when the model predicts the *end token*.\n",
"* And store the *attention weights for every time step*.\n",
"* The evaluate function is similar to the training loop, except we don't use *teacher forcing* here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.(**Evaluate 함수는 teacher forcing이 없는 training 루프문과 비슷합니다. 매 step의 Decoder의 Input은 hidden state와 encoder output, 이전 step의 예측값입니다.**)\n",
"* Stop predicting when the model predicts the *end token*.(**모델의 마지막 token을 예측하면 predicting을 멈춥니다.**)\n",
"* And store the *attention weights for every time step*.(**매 time step마다 attention weights를 저장합니다.**)\n",
"\n",
"Note: The encoder output is calculated only once for one input."
"Note: The encoder output is calculated only once for one input.(**참고 : Encoder Output은 한 개의 Input에서 단 한번만 계산됩니다.**)"
]
},
{
Expand Down Expand Up @@ -1306,7 +1307,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
"version": "3.6.2"
}
},
"nbformat": 4,
Expand Down