UCCA-Annotated Wall Street Journal Sentences

Version 2.0 (April 8, 2019)

This bundle contains 100 sentences annotated according to the foundational layer of UCCA. The sentences are from section 00 of the Wall Street Journal corpus. The passages are given as XMLs. The total number of tokens in this corpus is 2273.

All text tokens in the files have been replaced with underscores for licensing reasons. Replace them back to the original WSJ text to obtain the full annotation: after obtaining a directory (WSJ_DIR) containing PTB .mrg files organized by section (00, 02 etc.), run:

scripts/insert_tokens.sh WSJ_DIR

The dataset is a part of the UCCA project developed in the NLP lab of the Hebrew University by Omri Abend and Ari Rappoport. The users of this dataset are kindly requested to cite the following publication:

@InProceedings{hershcovich2018multitask,
  author    = {Hershcovich, Daniel  and  Abend, Omri  and  Rappoport, Ari},
  title     = {Multitask Parsing Across Semantic Representations},
  booktitle = {Proc. of ACL},
  year      = {2018},
  url       = {https://www.aclweb.org/anthology/P18-1035}
}

Please refer to our website or email ([email protected]) for regular updates on the UCCA project and available resources.

Files included

The passages files in an XML format, under xml. File names in xml are of the form wsj_XXX.xml where XXX is the sentence ID. Please see the UCCA resource webpage for a software package for reading and using these files.
Scripts for manipulating these files, under scripts.

Licensing:

The UCCA annotation is distributed under the "Attribution-ShareAlike 3.0 Unported" license (http://creativecommons.org/licenses/by-sa/3.0/).

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
scripts		scripts
xml		xml
.travis.yml		.travis.yml
README.md		README.md
guidelines.pdf		guidelines.pdf
short_defs.deprecated.pdf		short_defs.deprecated.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCCA-Annotated Wall Street Journal Sentences

Version 2.0 (April 8, 2019)

Files included

Licensing:

About

Releases

Packages

Languages

UniversalConceptualCognitiveAnnotation/UCCA_English-WSJ

Folders and files

Latest commit

History

Repository files navigation

UCCA-Annotated Wall Street Journal Sentences

Version 2.0 (April 8, 2019)

Files included

Licensing:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages