Releases · amir-zeldes/gum

05 May 19:14

V7.1.0

4525197

V7.1.0 - enhanced dependencies, consistency overhaul and more

(Note: this version contains the content-identical superset of annotations producing UD_English-GUM in Universal Dependencies V2.8)

Massive round of consistency corrections and harmonization with English Web Treebank, PTB and OntoNotes
Added enhanced dependencies
More error validations
Added multiword tokens to CoNLL-U format (caution: token IDs like 1-2 now in use!)
Added reconstructed ellipsis tokens to CoNLL-U format (caution: token IDs like 8.1 now in use!)
Added metadata to CoNLL-U files
Better escape characters in Wikification
ANNIS conversion support for null nodes to accommodate ellipsis tokens

Assets 2

19 Jan 23:18

amir-zeldes

V7.0.0

2a08bcc

V7.0.0 - new genres, Wikification and more

20 documents added from four new genres (total tokens: 150,756):
- Face to face conversation (material from the Santa Barbara Corpus courtesy of John Du Bois, UCSB)
- Political speeches (public domain data)
- Open access text books from OpenStax
- YouTube Creative Commons-licensed vlogs
New Wikification layer covering all named entities, including nested and pronominal mentions (work by Yi-Ju Lin)
Complete overhaul of date/time normalization (work by Nitin Venkateswaran)
Added function labels to constituent trees
Added addressee information for speakers in UD data
Complete overhaul of entity and coreference annotations, incl. separate annotation of split antecedents (work by Yi-Ju Lin and Amir Zeldes)
Increased consistency with other UD corpora, incl. new and more comprehensive morphological features
Many corrections to all annotation layers

Assets 2

13 Nov 21:41

amir-zeldes

V6.2.0

9704864

V6.2.0 - corrections and more consistency

Massive corrections to entity and coreference annotation
Massive corrections to TEI date/time annotations by @nitinvwaran
Removed entity type quantity and manually collapsed to other underlying types
Added infstat value split for split antecedent (edges are still bridge, but anaphor is now annotated explicitly, not just bridging + giv)
Coreference infstat and entity type matching now automatically validated (no new with antecedent/given with no antecedent)
src/tsv/ now contains only coref and bridge edges - all other edge types are derived:
- Coref chains beginning with a non-first/second person pronoun have an initial cata edge in target/
- Coref chain links between entities whose heads have the appos dependency in syntax receive the appos coref type automatically
- Non-cataphoric links emanating from a pronoun receive ana
Many improvements to syntactic dependency consistency, mostly consolidated with EWT annotation practices

Assets 2

08 Jun 18:51

amir-zeldes

V6.1.0

eb4f19f

V6.1.0 - corrections and bug fixes

Many corrections and fixes to the build bot

Assets 2

06 Mar 02:11

amir-zeldes

V6.0.0

dc9e151

V6.0.0 - first release of GUM series 6

New in this version:

22 documents added (total tokens: 129,660)
Discourse parses in Rhetorical Structure Theory now follow RST-DT guidelines
5 new discourse relations (means, manner, attribution, question and same-unit)
Discourse dependency representation and lisp-style formats available
Now using native Universal Dependencies syntax trees (not automatic conversion)
Many manual corrections to lemmatization, POS and other consistency improvements

Assets 2

31 Oct 14:12

amir-zeldes

V5.1.0

449342a

V5.1.0 - Final release of GUM series 5

Final release of GUM5, numerous corrections

global overhaul of some lemmas
fresh constituent parses
corrections to all annotation layers
this will be the last version of GUM with Stanford parses as a basis for dependencies (switching to UD as primary gold parses in V6)

Assets 2

21 Mar 17:31

amir-zeldes

V5.0.0

f764070

V5.0.0 - first release of GUM series 5

New in version 5:

New documents in academic, bio, fiction and reddit subcorpora
Split bridging relations into 3 subtypes:
- bridge:aggr - aggregate reference to multiple antecedents
- bridge:def - definite entity introduced by bridging
- bridge:other - all other cases of bridging
Add morph layer based on UD morphology
Add sentence type multiple for sentence coordinating multiple types; the type other now only includes sentences not falling into any other category
Merge Stanford and UD parses for simultaneous queries in ANNIS/PAULA
Separate coreference and bridging visualizations in ANNIS

Assets 2

20 Jan 01:37

amir-zeldes

V4.2.0

69b3b83

V4.2.0 - Final release of GUM series 4

Final release of GUM series 4:

Added s_type="multiple" for sentences containing multiple types (previously under "other")
Standardized some @rend from "italics" to always "italic"
Standardized hyphens/dashes in number ranges to have POS tag 'TO' (e.g. in ranges of years), matching the syntactic analysis
Changed some inconsistent POS tags for IPA name pronunciation from FW to NP
Added better imperative mood labeling to CoreNLP UD morph features based on manual s_type annotations
Removed spurious spans in RST files and fixed some segmentations not conforming to guidelines
Numerous assorted error corrections

Assets 2

16 May 20:34

amir-zeldes

V4.1.0

c8c391a

V4.1.0 - corresponds to UD V2.2

Stable release V4.1.0 / V4.1.0nr

Version number in top level folder suffixed with nr indicates Reddit data is not included in top level folders
To build the complete V4.1.0 see README_reddit.md (source annotation data is included in _build/src/)
This version contains the data which was used to generate Universal Dependencies release 2.2 of UniversalDependencies/UD_English-GUM

Assets 2

03 Mar 21:51

amir-zeldes

V4.0.1

9c2665e

V4.0.1 - Minor build bot fixes

Build can be run from other directories with relative path
utf8 encoding specified for ud conversion
Annotations identical to V4.0.0

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: amir-zeldes/gum

V7.1.0 - enhanced dependencies, consistency overhaul and more

V7.0.0 - new genres, Wikification and more

V6.2.0 - corrections and more consistency

V6.1.0 - corrections and bug fixes

V6.0.0 - first release of GUM series 6

V5.1.0 - Final release of GUM series 5

V5.0.0 - first release of GUM series 5

V4.2.0 - Final release of GUM series 4

V4.1.0 - corresponds to UD V2.2

V4.0.1 - Minor build bot fixes