Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build_vocab #3

Open
wulidongdong opened this issue Jan 17, 2021 · 5 comments
Open

build_vocab #3

wulidongdong opened this issue Jan 17, 2021 · 5 comments

Comments

@wulidongdong
Copy link

Hi Steve,

I found that if I use the build_vocab script in current OpenNMT_py version (2.0.0), the output vocab file is not compatible with the ggnn encoder. It will raise such a error.

Traceback (most recent call last):
  File "/home/cike/.local/bin/onmt_train", line 10, in <module>
    sys.exit(main())
  File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 169, in main
    train(opt)
  File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 154, in train
    train_process(opt, device_id=0)
  File "/home/cike/.local/lib/python3.6/site-packages/onmt/train_single.py", line 107, in main
    valid_steps=opt.valid_steps)
  File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 244, in train
    report_stats)
  File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 368, in _gradient_accumulation
    with_align=self.with_align)
  File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cike/.local/lib/python3.6/site-packages/onmt/models/model.py", line 45, in forward
    enc_state, memory_bank, lengths = self.encoder(src, lengths)
  File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cike/.local/lib/python3.6/site-packages/onmt/encoders/ggnn_encoder.py", line 182, in forward
    prop_state[i][j][token] = 1
IndexError: index 64 is out of bounds for axis 0 with size 64

But it works fine when I use the srcvocab.txt which is provided in this repo. Do you have any idea how to solve this problem? Thanks for your time.

Xin Wu

@SteveKommrusch
Copy link
Owner

SteveKommrusch commented Jan 17, 2021 via email

@wulidongdong
Copy link
Author

Xin Wu, Yes, the current implementation hard-codes a small vocabulary into the RNN size (the vocab can't be larger than the GNN size). I'm working to fix that and have been testing an embedding layer. I'll try to have something testable by Friday. Regards, Steve

On Sun, Jan 17, 2021 at 5:40 AM wulidongdong @.***> wrote: Hi Steve, I found that if I use the build_vocab script in current OpenNMT_py version (2.0.0), the output vocab file is not compatible with the ggnn encoder. It will raise such a error. Traceback (most recent call last): File "/home/cike/.local/bin/onmt_train", line 10, in sys.exit(main()) File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 169, in main train(opt) File "/home/cike/.local/lib/python3.6/site-packages/onmt/bin/train.py", line 154, in train train_process(opt, device_id=0) File "/home/cike/.local/lib/python3.6/site-packages/onmt/train_single.py", line 107, in main valid_steps=opt.valid_steps) File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 244, in train report_stats) File "/home/cike/.local/lib/python3.6/site-packages/onmt/trainer.py", line 368, in _gradient_accumulation with_align=self.with_align) File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/cike/.local/lib/python3.6/site-packages/onmt/models/model.py", line 45, in forward enc_state, memory_bank, lengths = self.encoder(src, lengths) File "/home/cike/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/cike/.local/lib/python3.6/site-packages/onmt/encoders/ggnn_encoder.py", line 182, in forward prop_state[i][j][token] = 1 IndexError: index 64 is out of bounds for axis 0 with size 64 But it works fine when I use the srcvocab.txt which is provided in this repo. Do you have any idea how to solve this problem? Thanks for your time. Xin Wu — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFBJG4KMYUWCAKUWOHUFN3S2LLEXANCNFSM4WGCPV6A .

Thank you Steve, that would be great and helpful! I am wondering can I use the old OpenNMT preprocess script to generate vocab files. Which version should I use?

@SteveKommrusch
Copy link
Owner

SteveKommrusch commented Jan 22, 2021 via email

@SteveKommrusch
Copy link
Owner

SteveKommrusch commented Jan 26, 2021 via email

@SteveKommrusch
Copy link
Owner

The pull request has been accepted so GGNN now supports an embedding layer in the main OpenNMT-py branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants