Skip to content

Training, validation, and test files for GGNN encoder option in OpenNMT-py.

Notifications You must be signed in to change notification settings

SteveKommrusch/OpenNMT-py-ggnn-example

Repository files navigation

OpenNMT-py-ggnn-example

The top-level *txt files in this repository provide a quick example of how to use the gated graph neural network input for OpenNMT-py. Refer to the ggnn.md link for an example use of the files in the top-level directory.

Graph input processing end-to-end example

The files in the 3 directories 'data', 'src', and 'runs' demonstrate how to use graph data represented textually with parenthesis with GGNN in OpenNMT (i.e., '(root (child1 2 3) (child2 4 5))'). Files in this format can be generated for various computer languages using ANTLR4.

Step 1: Install OpenNMT-py and related packages

Install OpenNMT-py from pip:

pip install OpenNMT-py

or from the sources:

git clone https://github.com/OpenNMT/OpenNMT-py.git
cd OpenNMT-py
# The next step should be done on a CUDA-capable device
python setup.py install

Note: If you have MemoryError in the install try to use pip with --no-cache-dir.

(Optional) some advanced features (e.g. working audio, image or pretrained models) requires extra packages, you can install it with:

pip install -r requirements.opt.txt

Step 2: Install ggnn-example and Environment setup

# cd to directory above OpenNMT-py
git clone [email protected]:SteveKommrusch/OpenNMT-py-ggnn-example.git
# Review env.sh script and adjust for your installation.
cd OpenNMT-py-ggnn-example
cat env.sh
source env.sh

Step 3: Train and use GGNN model

cd runs/graph2seq
../../src/setupgraph2seq.sh
setsid nice -n 19 onmt_train --config ggnn.yaml < /dev/null > train.nohup.out 2>&1
onmt_translate -model model_step_10000.pt -src src-test.txt -output pred-test_beam10.txt -gpu 0 -replace_unk -beam_size 10 -n_best 10 -batch_size 4 -verbose > trans10.out 2>&1
python ../../src/compare.py --src=pred-test_beam10.txt --tgt=tgt-test.txt -v > pass10.txt

Git file descriptions

  • data/antlr_files/*.txt: Output of ANTLR4 for 10 short programs in C++.
  • src/setupgraph2seq.sh: Creates vocab data and uses textual tree data to create OpenNMT GGNN input format.
  • src/raw2graph.pl: PERL script used by setupgraph2seq.sh to generate node, feature, and edge information for OpenNMT GGNN input format.
  • src/compare.py: Used to compare beam search translation output with expected target results. See steps above for example.
  • runs/graph2seq/ggnn.yaml: Commented parameters for GGNN run

Key generated file descriptions

  • data/raw_initial.txt: 10 example programs generated using ANTLR. The format per lise is "X program Y target", where Y is the target output to be generated, in this case the algorithm's filename.
  • data/graph_initial.txt: 10 example programs in OpenNMT ggnn graph input format. The format per lise is "X program Y target", where Y is the target output to be generated, in this case the algorithm's filename. The program syntax is as per https://github.com/OpenNMT/OpenNMT-py/blob/master/docs/source/ggnn.md
  • runs/graph2seq/*vocab.txt: Files generated by setupgraph2seq.sh for use by graph neural network in OpenNMT. Note srcvocab.txt includes tokens for all node numbers to allow for proper edge connection setup for the model.

Debugging issues

  • Some sample steps regarding setup and debug of the GGNN are discussed in OpenNMT-py issue 2058: OpenNMT/OpenNMT-py#2058

About

Training, validation, and test files for GGNN encoder option in OpenNMT-py.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published