forked from nltk/nltk.github.com
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
199 lines (184 loc) · 11.6 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Natural Language Toolkit — NLTK 3.0 documentation</title>
<link rel="stylesheet" href="_static/agogo.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '3.0',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="NLTK 3.0 documentation" href="#" />
<link rel="next" title="NLTK News" href="news.html" />
</head>
<body>
<div class="header-wrapper">
<div class="header">
<div class="headertitle"><a
href="#">NLTK 3.0 documentation</a></div>
<div class="rel">
<a href="news.html" title="NLTK News"
accesskey="N">next</a> |
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |
<a href="genindex.html" title="General Index"
accesskey="I">index</a>
</div>
</div>
</div>
<div class="content-wrapper">
<div class="content">
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="natural-language-toolkit">
<h1>Natural Language Toolkit<a class="headerlink" href="#natural-language-toolkit" title="Permalink to this headline">¶</a></h1>
<p>NLTK is a leading platform for building Python programs to work with human language data.
It provides easy-to-use interfaces to <a class="reference external" href="http://nltk.org/nltk_data/">over 50 corpora and lexical
resources</a> such as WordNet,
along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.</p>
<p>Thanks to a hands-on guide introducing programming fundamentals alongside topics in computational linguistics,
NLTK is suitable for linguists, engineers, students, educators, researchers, and industry users alike.
NLTK is available for Windows, Mac OS X, and Linux. Best of all, NLTK is a free, open source, community-driven project.</p>
<p>NLTK has been called “a wonderful tool for teaching, and working in, computational linguistics using Python,”
and “an amazing library to play with natural language.”</p>
<p><a class="reference external" href="http://nltk.org/book">Natural Language Processing with Python</a> provides a practical
introduction to programming for language processing.
Written by the creators of NLTK, it guides the reader through the fundamentals
of writing Python programs, working with corpora, categorizing text, analyzing linguistic structure,
and more.
A <a class="reference external" href="http://nltk.org/book3">new version</a> with updates for Python 3 and NLTK 3 is in preparation.</p>
<div class="section" id="some-simple-things-you-can-do-with-nltk">
<h2>Some simple things you can do with NLTK<a class="headerlink" href="#some-simple-things-you-can-do-with-nltk" title="Permalink to this headline">¶</a></h2>
<p>Tokenize and tag some text:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">nltk</span>
<span class="gp">>>> </span><span class="n">sentence</span> <span class="o">=</span> <span class="s">"""At eight o'clock on Thursday morning</span>
<span class="gp">... </span><span class="s">Arthur didn't feel very good."""</span>
<span class="gp">>>> </span><span class="n">tokens</span> <span class="o">=</span> <span class="n">nltk</span><span class="o">.</span><span class="n">word_tokenize</span><span class="p">(</span><span class="n">sentence</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">tokens</span>
<span class="go">['At', 'eight', "o'clock", 'on', 'Thursday', 'morning',</span>
<span class="go">'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']</span>
<span class="gp">>>> </span><span class="n">tagged</span> <span class="o">=</span> <span class="n">nltk</span><span class="o">.</span><span class="n">pos_tag</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">tagged</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">6</span><span class="p">]</span>
<span class="go">[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),</span>
<span class="go">('Thursday', 'NNP'), ('morning', 'NN')]</span>
</pre></div>
</div>
<p>Identify named entities:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">entities</span> <span class="o">=</span> <span class="n">nltk</span><span class="o">.</span><span class="n">chunk</span><span class="o">.</span><span class="n">ne_chunk</span><span class="p">(</span><span class="n">tagged</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">entities</span>
<span class="go">Tree('S', [('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'),</span>
<span class="go"> ('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN'),</span>
<span class="go"> Tree('PERSON', [('Arthur', 'NNP')]),</span>
<span class="go"> ('did', 'VBD'), ("n't", 'RB'), ('feel', 'VB'),</span>
<span class="go"> ('very', 'RB'), ('good', 'JJ'), ('.', '.')])</span>
</pre></div>
</div>
<p>Display a parse tree:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">nltk.corpus</span> <span class="kn">import</span> <span class="n">treebank</span>
<span class="gp">>>> </span><span class="n">t</span> <span class="o">=</span> <span class="n">treebank</span><span class="o">.</span><span class="n">parsed_sents</span><span class="p">(</span><span class="s">'wsj_0001.mrg'</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="gp">>>> </span><span class="n">t</span><span class="o">.</span><span class="n">draw</span><span class="p">()</span>
</pre></div>
</div>
<img alt="_images/tree.gif" src="_images/tree.gif" />
<p>NB. If you publish work that uses NLTK, please cite the NLTK book as
follows:</p>
<blockquote>
<div>Bird, Steven, Edward Loper and Ewan Klein (2009), <em>Natural Language Processing with Python</em>. O’Reilly Media Inc.</div></blockquote>
</div>
<div class="section" id="community">
<h2>Community<a class="headerlink" href="#community" title="Permalink to this headline">¶</a></h2>
<ul class="simple">
<li><a class="reference external" href="http://groups.google.com/group/nltk">NLTK mailing list</a> – release announcements only, very low volume</li>
<li><a class="reference external" href="http://groups.google.com/group/nltk-users">NLTK-Users mailing list</a> – user discussions</li>
<li><a class="reference external" href="http://groups.google.com/group/nltk-dev">NLTK-Dev mailing list</a> – developers only</li>
<li><a class="reference external" href="http://groups.google.com/group/nltk-translation">NLTK-Translation mailing list</a> – discussions about translating the NLTK book</li>
</ul>
</div>
</div>
<div class="section" id="contents">
<h1>Contents<a class="headerlink" href="#contents" title="Permalink to this headline">¶</a></h1>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="news.html">NLTK News</a></li>
<li class="toctree-l1"><a class="reference internal" href="install.html">Installing NLTK</a></li>
<li class="toctree-l1"><a class="reference internal" href="data.html">Installing NLTK Data</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk/nltk/wiki">Wiki</a></li>
<li class="toctree-l1"><a class="reference internal" href="api/nltk.html">API</a></li>
<li class="toctree-l1"><a class="reference external" href="http://nltk.org/howto">HOWTO</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk">NLTK Development</a></li>
<li class="toctree-l1"><a class="reference internal" href="team.html">Team NLTK</a></li>
</ul>
</div>
<ul class="simple">
<li><a class="reference internal" href="genindex.html"><em>Index</em></a></li>
<li><a class="reference internal" href="py-modindex.html"><em>Module Index</em></a></li>
<li><a class="reference internal" href="search.html"><em>Search Page</em></a></li>
</ul>
</div>
</div>
</div>
</div>
</div>
<div class="sidebar">
<h3>Table Of Contents</h3>
<ul>
<li class="toctree-l1"><a class="reference internal" href="news.html">NLTK News</a></li>
<li class="toctree-l1"><a class="reference internal" href="install.html">Installing NLTK</a></li>
<li class="toctree-l1"><a class="reference internal" href="data.html">Installing NLTK Data</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk/nltk/wiki">Wiki</a></li>
<li class="toctree-l1"><a class="reference internal" href="api/nltk.html">API</a></li>
<li class="toctree-l1"><a class="reference external" href="http://nltk.org/howto">HOWTO</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk">NLTK Development</a></li>
<li class="toctree-l1"><a class="reference internal" href="team.html">Team NLTK</a></li>
</ul>
<h3 style="margin-top: 1.5em;">Search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
<p class="searchtip" style="font-size: 90%">
Enter search terms or a module, class or function name.
</p>
</div>
<div class="clearer"></div>
</div>
</div>
<div class="footer-wrapper">
<div class="footer">
<div class="left">
<a href="news.html" title="NLTK News"
>next</a> |
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |
<a href="genindex.html" title="General Index"
>index</a>
<br/>
<a href="_sources/index.txt"
rel="nofollow">Show Source</a>
</div>
<div class="right">
<div class="footer">
© Copyright 2013, NLTK Project.
Last updated on Nov 05, 2013.
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.2b3.
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</body>
</html>