-
Notifications
You must be signed in to change notification settings - Fork 4
Training Data
This classifier was trained using the human-annotated Szeged Uncertainty Corpus, which is composed of three sub-corpora:
The original corpus is provided in XML and has been reformatted (by us) into JSON for readability.
A secondary corpus is provided within the source code used in the experiments for the ConLL-2010 Shared Task. This corpus contains all of the pre-generated features used to train the original classifier. We have the unedited features available here and the updated features (with multiclass labels) available here.
📃 [1]
Vincze, V., Szarvas, G., Farkas, R., Móra, G., & Csirik, J. (2008). The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC bioinformatics, 9(11), S9.
📃 [2]
Saurí, R., & Pustejovsky, J. (2009). FactBank: a corpus annotated with event factuality. Language resources and evaluation, 43(3), 227.
📃 [3]
Farkas, R., Vincze, V., Móra, G., Csirik, J., & Szarvas, G. (2010, July). The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning---Shared Task (pp. 1-12). Association for Computational Linguistics.