The top-level *txt files in this repository provide a quick example of how to use the gated graph neural network input for OpenNMT-py. Refer to the ggnn.md link for an example use of the files in the top-level directory.
The files in the 3 directories 'data', 'src', and 'runs' demonstrate how to use graph data represented textually with parenthesis with GGNN in OpenNMT (i.e., '(root (child1 2 3) (child2 4 5))'). Files in this format can be generated for various computer languages using ANTLR4.
Install OpenNMT-py
from pip
:
pip install OpenNMT-py
or from the sources:
git clone https://github.com/OpenNMT/OpenNMT-py.git
cd OpenNMT-py
# The next step should be done on a CUDA-capable device
python setup.py install
Note: If you have MemoryError in the install try to use pip
with --no-cache-dir
.
(Optional) some advanced features (e.g. working audio, image or pretrained models) requires extra packages, you can install it with:
pip install -r requirements.opt.txt
# cd to directory above OpenNMT-py
git clone [email protected]:SteveKommrusch/OpenNMT-py-ggnn-example.git
# Review env.sh script and adjust for your installation.
cd OpenNMT-py-ggnn-example
cat env.sh
source env.sh
cd runs/graph2seq
../../src/setupgraph2seq.sh
setsid nice -n 19 onmt_train --config ggnn.yaml < /dev/null > train.nohup.out 2>&1
onmt_translate -model model_step_10000.pt -src src-test.txt -output pred-test_beam10.txt -gpu 0 -replace_unk -beam_size 10 -n_best 10 -batch_size 4 -verbose > trans10.out 2>&1
python ../../src/compare.py --src=pred-test_beam10.txt --tgt=tgt-test.txt -v > pass10.txt
- data/antlr_files/*.txt: Output of ANTLR4 for 10 short programs in C++.
- src/setupgraph2seq.sh: Creates vocab data and uses textual tree data to create OpenNMT GGNN input format.
- src/raw2graph.pl: PERL script used by setupgraph2seq.sh to generate node, feature, and edge information for OpenNMT GGNN input format.
- src/compare.py: Used to compare beam search translation output with expected target results. See steps above for example.
- runs/graph2seq/ggnn.yaml: Commented parameters for GGNN run
- data/raw_initial.txt: 10 example programs generated using ANTLR. The format per lise is "X program Y target", where Y is the target output to be generated, in this case the algorithm's filename.
- data/graph_initial.txt: 10 example programs in OpenNMT ggnn graph input format. The format per lise is "X program Y target", where Y is the target output to be generated, in this case the algorithm's filename. The program syntax is as per https://github.com/OpenNMT/OpenNMT-py/blob/master/docs/source/ggnn.md
- runs/graph2seq/*vocab.txt: Files generated by setupgraph2seq.sh for use by graph neural network in OpenNMT. Note srcvocab.txt includes tokens for all node numbers to allow for proper edge connection setup for the model.
- Some sample steps regarding setup and debug of the GGNN are discussed in OpenNMT-py issue 2058: OpenNMT/OpenNMT-py#2058