-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build_vocab #3
Comments
Xin Wu,
Yes, the current implementation hard-codes a small vocabulary into the
RNN size (the vocab can't be larger than the GNN size). I'm working to fix
that and have been testing an embedding layer. I'll try to have something
testable by Friday.
Regards,
Steve
…On Sun, Jan 17, 2021 at 5:40 AM wulidongdong ***@***.***> wrote:
Hi Steve,
I found that if I use the build_vocab script in current OpenNMT_py version
(2.0.0), the output vocab file is not compatible with the ggnn encoder. It
will raise such a error.
Traceback (most recent call last):
File "/home/cike/.local/bin/onmt_train", line 10, in <module>
sys.exit(main())
File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 169, in main
train(opt)
File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 154, in train
train_process(opt, device_id=0)
File "/home/cike/.local/lib/python3.6/site-packages/onmt/train_single.py", line 107, in main
valid_steps=opt.valid_steps)
File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 244, in train
report_stats)
File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 368, in _gradient_accumulation
with_align=self.with_align)
File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cike/.local/lib/python3.6/site-packages/onmt/models/model.py", line 45, in forward
enc_state, memory_bank, lengths = self.encoder(src, lengths)
File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cike/.local/lib/python3.6/site-packages/onmt/encoders/ggnn_encoder.py", line 182, in forward
prop_state[i][j][token] = 1
IndexError: index 64 is out of bounds for axis 0 with size 64
But it works fine when I use the srcvocab.txt which is provided in this
repo. Do you have any idea how to solve this problem? Thanks for your time.
Xin Wu
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGFBJG4KMYUWCAKUWOHUFN3S2LLEXANCNFSM4WGCPV6A>
.
|
Thank you Steve, that would be great and helpful! I am wondering can I use the old OpenNMT preprocess script to generate vocab files. Which version should I use? |
Xin Wu,
I have my embedding code passing tests but I'm working through the
checkers now for a clean pull request. The new pull request will allow for
larger vocabularies and handle the old and new vocab formats, but the vocab
file must include <EOT>, ',', and the numbers up to the node count (so that
edge information can be supplied).
To learn a bit more about setup, you can look at my example Github file
here:
https://github.com/SteveKommrusch/OpenNMT-py-ggnn-example/blob/master/src/setupgraph2seq.sh
That file includes a perl line that processes a raw vocab file to add the
extra tokens.
Regards,
Steve
On Mon, Jan 18, 2021 at 12:17 AM wulidongdong <[email protected]>
wrote:
… Xin Wu, Yes, the current implementation hard-codes a small vocabulary into
the RNN size (the vocab can't be larger than the GNN size). I'm working to
fix that and have been testing an embedding layer. I'll try to have
something testable by Friday. Regards, Steve
… <#m_-2403714031098670137_>
On Sun, Jan 17, 2021 at 5:40 AM wulidongdong *@*.***> wrote: Hi Steve, I
found that if I use the build_vocab script in current OpenNMT_py version
(2.0.0), the output vocab file is not compatible with the ggnn encoder. It
will raise such a error. Traceback (most recent call last): File
"/home/cike/.local/bin/onmt_train", line 10, in sys.exit(main()) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line
169, in main train(opt) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line
154, in train train_process(opt, device_id=0) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/train_single.py", line
107, in main valid_steps=opt.valid_steps) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 244,
in train report_stats) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 368,
in _gradient_accumulation with_align=self.with_align) File
"/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py",
line 727, in _call_impl result = self.forward(*input, **kwargs) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/models/model.py", line
45, in forward enc_state, memory_bank, lengths = self.encoder(src, lengths)
File
"/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py",
line 727, in _call_impl result = self.forward(*input, **kwargs) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/encoders/ggnn_encoder.py",
line 182, in forward prop_state[i][j][token] = 1 IndexError: index 64 is
out of bounds for axis 0 with size 64 But it works fine when I use the
srcvocab.txt which is provided in this repo. Do you have any idea how to
solve this problem? Thanks for your time. Xin Wu — You are receiving this
because you are subscribed to this thread. Reply to this email directly,
view it on GitHub <#3
<#3>>, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AGFBJG4KMYUWCAKUWOHUFN3S2LLEXANCNFSM4WGCPV6A
.
Thank you Steve, that would be great and helpful! I am wondering can I use
the old OpenNMT preprocess script to generate vocab files. Which version
should I use?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGFBJGZC6A4FJR7ZUEY7FQDS2PN7XANCNFSM4WGCPV6A>
.
|
Xin Wu,
I have created a pull request to OpenNMT here:
OpenNMT/OpenNMT-py#1998
The changes to ggnn_encoder.py allow for an embedding layer, which
allows an arbitrarily large vocab to be used. Also, I updated my example
code (which relies on the new GGNN code) here:
https://github.com/SteveKommrusch/OpenNMT-py-ggnn-example#graph-input-processing-end-to-end-example
. Along with the new ggnn_encoder.py you can now use the current
onmt_build_vocab to create a file which can be easily adjusted for GGNN
usage. That end-to-end example also has a script that can help format
textual trees like (a + (b * c) ) into tree structures used by the GGNN.
Let me know if I can help more. If you can't download from my pull
request, I could send you the ggnn_encoder.py directly.
Regards,
Steve
On Mon, Jan 18, 2021 at 12:17 AM wulidongdong <[email protected]>
wrote:
… Xin Wu, Yes, the current implementation hard-codes a small vocabulary into
the RNN size (the vocab can't be larger than the GNN size). I'm working to
fix that and have been testing an embedding layer. I'll try to have
something testable by Friday. Regards, Steve
… <#m_-1730141524549641451_>
On Sun, Jan 17, 2021 at 5:40 AM wulidongdong *@*.***> wrote: Hi Steve, I
found that if I use the build_vocab script in current OpenNMT_py version
(2.0.0), the output vocab file is not compatible with the ggnn encoder. It
will raise such a error. Traceback (most recent call last): File
"/home/cike/.local/bin/onmt_train", line 10, in sys.exit(main()) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line
169, in main train(opt) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line
154, in train train_process(opt, device_id=0) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/train_single.py", line
107, in main valid_steps=opt.valid_steps) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 244,
in train report_stats) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 368,
in _gradient_accumulation with_align=self.with_align) File
"/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py",
line 727, in _call_impl result = self.forward(*input, **kwargs) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/models/model.py", line
45, in forward enc_state, memory_bank, lengths = self.encoder(src, lengths)
File
"/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py",
line 727, in _call_impl result = self.forward(*input, **kwargs) File
"/home/cike/.local/lib/python3.6/site-packages/onmt/encoders/ggnn_encoder.py",
line 182, in forward prop_state[i][j][token] = 1 IndexError: index 64 is
out of bounds for axis 0 with size 64 But it works fine when I use the
srcvocab.txt which is provided in this repo. Do you have any idea how to
solve this problem? Thanks for your time. Xin Wu — You are receiving this
because you are subscribed to this thread. Reply to this email directly,
view it on GitHub <#3
<#3>>, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AGFBJG4KMYUWCAKUWOHUFN3S2LLEXANCNFSM4WGCPV6A
.
Thank you Steve, that would be great and helpful! I am wondering can I use
the old OpenNMT preprocess script to generate vocab files. Which version
should I use?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGFBJGZC6A4FJR7ZUEY7FQDS2PN7XANCNFSM4WGCPV6A>
.
|
The pull request has been accepted so GGNN now supports an embedding layer in the main OpenNMT-py branch. |
Hi Steve,
I found that if I use the build_vocab script in current OpenNMT_py version (2.0.0), the output vocab file is not compatible with the ggnn encoder. It will raise such a error.
But it works fine when I use the srcvocab.txt which is provided in this repo. Do you have any idea how to solve this problem? Thanks for your time.
Xin Wu
The text was updated successfully, but these errors were encountered: