forked from OpenCCG/openccg
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
234 lines (171 loc) · 8.95 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
Introduction
============
See CHANGES for a description of the project status.
Also see the web site and wiki at UT Austin:
* http://openccg.sf.net
* http://www.utcompling.com/wiki/openccg
This file contains the configuration and build instructions. Next
you'll probably want to look at the tutorial on writing grammars in
the human-friendly 'dot ccg' syntax on the UT Austin wiki:
* http://www.utcompling.com/wiki/openccg/visccg-tutorial
After that it may be helpful to look at the "native" grammar
specification in "Specifying Grammars for OpenCCG: A Rough Guide" in
docs/grammars-rough-guide.pdf, as well as the SAMPLE_GRAMMARS file for
descriptions of the sample grammars that come with the distribution,
including ones using the DotCCG syntax. A (somewhat dated)
programmer's guide to using the OpenCCG realizer appears in
docs/realizer-manual.pdf.
This release also includes a broad English coverage grammar from the
CCGBank and associated statistical models; see docs/ccgbank-README for
details.
Requirements
============
* Version 1.6 or later of the Java 2 SDK (http://java.sun.com)
* For ccg2xml and other tools, Python version 2.4 to 2.7 (http://www.python.org)
Configuring your environment variables
======================================
The easiest thing to do is to set the environment variables JAVA_HOME
and OPENCCG_HOME to the relevant locations on your system. Set JAVA_HOME
to match the top level directory containing the Java installation you
want to use.
For example, on Windows:
C:\> set JAVA_HOME=C:\Program Files\jdk1.6.0_04
or on Unix:
% setenv JAVA_HOME /usr/local/java
(csh)
> export JAVA_HOME=/usr/java
(ksh, bash)
On Windows, to get these settings to persist, it's actually easiest to
set your environment variables through the System Properties from the
Control Panel. For example, under WinXP, go to Control Panel, click on
System Properties, choose the Advanced tab, click on Environment
Variables, and add your settings in the User variables area.
Next, likewise set OPENCCG_HOME to be the top level directory where you
unzipped the download. In Unix, type 'pwd' in the directory where
this file is and use the path given to you by the shell as
OPENCCG_HOME. You can set this in the same manner as for JAVA_HOME
above.
Next, add the directory OPENCCG_HOME/bin to your path. For example, you
can set the path in your .bashrc file as follows:
> export PATH="$PATH:$OPENCCG_HOME/bin"
On Windows, you should also add the python main directory to your path.
Finally, if you are going to use KenLM with very large language models
for realization with CCGbank-extracted grammars on linux, you'll also need to
set the library load path:
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$OPENCCG_HOME/lib
Once you have taken care of these things, you should be able to
build and use the OpenCCG Library.
Note: Spaces are allowed in JAVA_HOME but not in OPENCCG_HOME. To set
an environment variable with spaces in it, you need to put quotes around
the value when on Unix, but you must *NOT* do this when under Windows.
Increasing Java memory limit
============================
If you're working with a broad coverage grammar and statistical
parsing or realization models, you'll probably need to increase the
default memory limit for running OpenCCG's tools. You can do so by
editing bin/ccg-env[.bat], increasing the JAVA_MEM environment
variable at the end of this script. For training perceptron models in
memory, you may need 16g; for realization with the very large gigaword
5-gram model, you may need 8g; otherwise, for parsing and realization
with ccgbank-derived models, 4g or possibly even 2g should suffice;
finally, for small grammars 512m or 256m should be ok.
Trying it out
=============
If you've managed to configure the system, you should be able
to change to the directory for the "tiny" sample grammar and run "tccg"
(for text ccg), the command-line tool for interactively testing
grammars:
> cd grammars
> cd tiny
> tccg (Windows/Unix)
Provided tccg starts properly, it loads the grammar files, parses them,
and shows the command-line interface (at which point you can type :h for
help or :q to quit).
If you trouble starting up tccg, make sure you have set the environment
variables properly, and that the tccg script (located in openccg/bin)
calls the right shell environment (top-line of the script; to solve the
problem, either comment out this line or correct the path).
Visualizing semantic graphs
===========================
Semantic dependency graphs in testbed files can be visualized with the
help of Graphviz's dot tool. First, download and install Graphviz
(http://www.graphviz.org/). Then, use tccg to create a testbed files
with logical forms in it. For example, you can try some examples in
the worldcup sample grammar and save them to a file using the command
':2tb tb.xml'. Then make a directory to store the visualized graphs.
Finally, run the ccg-draw-graph tool as shown below:
> cd grammars/worldcup
> tccg (parse examples, save using ':2tb tb.xml')
> mkdir graphs
> ccg-draw-graph -i tb.xml -v graphs/g
You can also show the semantic classes or word indices using the -c or
-w options, respectively. The graphs can be displayed with any PDF
display tool.
Note that the graph visualization requires the logical forms to be
stored in an xml node-rel format for graphs, as in the worldcup or
routes sample grammars. See SAMPLE_GRAMMARS for more information.
Creating disjunctive logical forms
==================================
This release includes a new disjunctivizer package, for creating a
disjunctive LF XML structure based on an LF graph difference. An LF
graph difference is a characterization of the difference between two
Hybrid Logic Dependency Semantics graphs and an alignment between them
in terms of the edits needed to make one into the other: inserts,
deletes, and substitutions. See the build file for junit tests that
illustrate how to use the package.
Generating grammar documentation
================================
OpenCCG includes a tool for generating HTML documentation of the XML files
that specify a grammar. It can be run either from the ccg-grammardoc script in
the bin/ directory, or as an Ant task. An example of how to incorporate
GrammarDoc into an Ant build file is given in the "tiny" grammar
(grammars/tiny/build.xml), in a build target called "document."
Building the system from source
===============================
The OpenCCG build system is based on Apache Ant. Ant is a little but very
handy tool that uses a build file written in XML (build.xml) as building
instructions. Building the Java portion of OpenCCG is accomplished using the
script `ccg-build'; this works under Windows and Unix, but requires that
you run it from the top-level directory (where the build.xml file is
located). If everything is right and all the required packages are
visible, this action will generate a file called openccg.jar in the
./lib directory.
Note that you should *not* build from source by invoking 'ant'
directly. Instead, you should use ccg-build as shown below (Unix),
after ensuring that you've set OPENCCG_HOME, JAVA_HOME and updated
your PATH (the ccg-build script invokes ant with various parameters
that aren't set properly if ant is invoked from the command line):
> cd $OPENCCG_HOME
> ccg-build
Working with the Eclipse IDE
============================
The Eclipse IDE can be used for editing the Java source code, though
setup can be a bit tricky. The most reliable method seems to be as
follows. First, follow the instructions above for building the source
from the command line. Then, in Eclipse, choose File|New|Java Project
to create a new Java Project, and give it a name, such as 'openccg'.
Leave the default settings as they are, and click Next. Then choose
Link Additional Source and browse to the folder 'src' in the directory
where you installed OpenCCG (ie $OPENCCG_HOME/src). You'll need to
give this location a new name, such as 'src2' ('src' is already taken
by default). The final step is to Add External JARs under the
Libraries tab. From OpenCCG's lib directory (ie $OPENCCG_HOME/lib),
choose all of the .jar files. At this point, you should be able to
hit Finish and the code should compile in Eclipse.
Note that with Eclipse's default settings, the code will compile in
your Eclipse workspace, which is separate from your OpenCCG
installation (this is a good thing, as Eclipse uses a 'bin' directory
for compiled Java classes, whereas OpenCCG uses 'bin' for command-line
scripts). Thus, once you have made a round of changes in Eclipse and
are ready to try them out in OpenCCG, go back to the command line in
$OPENCCG_HOME and invoke ccg-build to re-build the openccg.jar file.
This will make your changes available in OpenCCG's programs, such as
tccg.
Bug Reports
===========
Please report bugs at the bug section of the OpenCCG sourceforge site.
Special Note
============
Parts of this README and some of the directory structure and the build
system for this project were borrowed from the JDOM project (kudos!).
See www.jdom.org for more details.