📓 Update 01 October 2023: this collection is now available in arekit-ss for a quick sampling of contexts with all subject-object relation mentions with just single script into
JSONL/CSV/SqLite
including (optional) language transferring 🔥 [Learn more ...]
Dataset description: RuSentRel collection consisted of analytical articles from Internet-portal inosmi.ru
.
These are translated into Russian texts in the domain of international politics obtained from foreign authoritative sources.
The collected articles contain both the author's opinion on the subject matter
of the article and a large number of references mentioned between the participants of the described situations.
In total, 73 large analytical texts were labeled with about 2000 relations.
This repository is an official results benchmark for automatic sentiment attitude extraction task within RuSentRel collection. Let's follow the task section for greater details.
Contributing: Please feel free to make pull requests, and at awesome-sentiment-attitude-extraction especially!
For more details about RuSentRel please proceed with the related repository.
Given a subset of documents in the RuSentRel collection, where each document is presented by a pair: (1) text, (2) a list of selected named entities. For each document, it is required to complete a list of such entity pairs (es, eo), for which text conveys the presence of sentiment relation from the es (subject) towards an eo (object). Label assignation can be neg or pos.
Example |
---|
... При этом Москва неоднократно подчеркивала, что ее активность на балтике является ответом именно на действия НАТО и эскалацию враждебного подхода к Росcии вблизи ее восточных границ ... (... Meanwhile Moscow has repeatedly emphasized that its activity in the Baltic Sea is a response precisely to actions of NATO and the escalation of the hostile approach to Russia near its eastern borders ...) |
(NATO->Russia, neg), (Russia->NATO, neg) |
Task paper: https://arxiv.org/pdf/1808.08932.pdf
The task is considered as a context classification problem, in which context is a text region with mentioned pair (attitude participants) in it. Then classified context-level attitudes transfers onto document-level by averaging context labels of the related pair (using the voting method).
We implement AREkit toolkit which becomes a framework for the following applications:
- BERT-based language models [code];
- Neural Networks with (and w/o) Attention mechanism [code];
- Conventional Machine Learning methods [code];
Source code exported from AREkit-0.21.0 library and yields of:
- Evaluation directory for details of the evaluator implementation and the related dependencies;
- Test directory, which includes test scripts that allow applying evaluator for the archived results.
Use evaluate.py
to evaluate your submissions.
Below is an example for assessing the results of ChatGPT-3.5-0613
:
python3 evaluate.py --input data/chatgpt-avg.zip --mode classification --split cv3
Results ordered from the latest to the oldest. We measure F1
(scaled by 100) across the following foldings (see evaluator section for greater details):
- F1cv - the average
F1
of a 3-fold CV check; foldings carried out by preserving the same number of sentences in each of them; - Ft --
F1
over the predefined TEST set;
The result assessment organized in experiments:
3l
-- subject-object pairs extraction.2l
-- classification of already given subject-object pairs on document level;
Methods | F1cv (3l) | F1t (3l) | F1cv (2l) | F1t (2l) |
---|---|---|---|---|
Expert Agreement** [1] | 55.0 | 55.0 | - | - |
ChatGPT zero-shot with promptings*** [7] | ||||
ChatGPT3.5-0613, avg [200 words distance] | 37.7 | 39.6 | ||
ChatGPT3.5-0613, avg [50 words distance] | 66.19 | 74.47 | ||
ChatGPT3.5-0613, first [50 words distance] | 69.23 | 74.09 | ||
Distant SupervisionRA-2.0-large for Language Models (BERT-based) [6] | ||||
[pt -- pretrained, ft -- fine-tunded] | ||||
SentenceRuBERT (NLIpt + NLIft) | 39.0 | 38.0 | 70.2 | 67.7 |
SentenceRuBERT (NLIpt + QAft) | 38.4 | 41.9 | 69.6 | 64.2 |
SentenceRuBERT (NLIpt + Cft) | 37.9 | 39.8 | 70.0 | 69.8 |
RuBERT (NLIpt + NLIft) | 36.8 | 39.9 | 71.0 | 68.6 |
RuBERT (NLIpt + QAft) | 34.8 | 37.0 | 69.6 | 68.2 |
RuBERT (NLIpt + Cft) | 35.6 | 35.4 | 70.0 | 69.8 |
mBase (NLIpt + NLIft) | 33.6 | 36.0 | 69.4 | 68.2 |
mBase (NLIpt + QAft) | 30.1 | 35.5 | 69.6 | 65.2 |
mBase (NLIpt + Cft) | 30.5 | 31.1 | 68.9 | 67.7 |
Distant SupervisionRA-2.0-large for (Attentive) Neural Networks + Frames annotation [Joined Training] [6]reproduced, [4]original | ||||
PCNNends | 32.2 | 39.9 | 70.2 | 67.8 |
BiLSTM | 32.0 | 38.8 | 71.2 | 68.4 |
PCNN | 31.6 | 39.7 | 69.5 | 70.5 |
LSTM | 31.6 | 39.5 | 68.0 | 75.4 |
Att-BiLSTM [P.Zhou et. al] | 31.0 | 37.3 | 66.2 | 71.2 |
AttCNNends | 30.9 | 39.9 | 66.8 | 72.7 |
IANends | 30.7 | 36.7 | 69.1 | 72.6 |
Distant SupervisionRA-1.0 for Multi-Instance Neural Networks [Joined Training] [5] | ||||
MI-PCNN | 68.0 | |||
MI-CNN | 62.0 | |||
PCNN | 67.0 | |||
CNN | 63.0 | |||
Language Models (BERT-based) [6] | ||||
SentenceRuBERT (NLI) | 33.4 | 32.7 | 69.8 | 67.6 |
SentenceRuBERT (QA) | 34.3 | 38.9 | 70.2 | 67.1 |
SentenceRuBERT (C) | 34.0 | 35.2 | 69.3 | 65.5 |
RuBERT (NLI) | 29.4 | 39.6 | 68.9 | 66.4 |
RuBERT (QA) | 32.0 | 35.3 | 69.5 | 66.2 |
RuBERT (C) | 36.8 | 37.6 | 67.8 | 66.2 |
mBase (NLI) | 29.2 | 37.0 | 67.8 | 58.4 |
mBase (QA) | 28.6 | 33.8 | 66.5 | 65.4 |
mBase (C) | 26.9 | 30.0 | 67.0 | 68.9 |
(Attentive) Neural Networks + Frames annotation ([6]reproduced, [3]original) | ||||
IANends | 30.8 | 32.2 | 60.8 | 63.5 |
AttPCNNends | 29.9 | 32.6 | 64.3 | 63.3 |
PCNN | 29.6 | 32.5 | 64.4 | 63.3 |
CNN | 28.7 | 31.4 | 63.6 | 65.9 |
BILSTM | 28.6 | 32.4 | 62.3 | 71.2 |
LSTM | 27.9 | 31.6 | 61.9 | 65.3 |
AttCNNends | 27.6 | 29.7 | 65.0 | 66.2 |
Att-BiLSTM [P.Zhou et. al] | 27.5 | 32.3 | 65.7 | 68.2 |
Convolutional networks [2] | ||||
PCNN [code] | 0.31 | |||
CNN | 0.30 | |||
Conventional methods [1] [code] | ||||
Gradient Boosting (Grid search) | 20.3* | 28.0 | ||
Random Forest (Grid search) | 19.1* | 27.0 | ||
Random Forest | 15.7* | 27.0 | ||
Naive Bayes (Bernoulli) | 15.2* | 16.0 | ||
SVM | 15.1* | 15.0 | ||
Gradient Boosting | 14.4* | 27.0 | ||
SVM (Grid search) | 14.3* | 15.0 | ||
NaiveBayes (Gauss) | 9.2* | 11.0 | ||
KNN | 7.0* | 9.0 | ||
Baseline (School) [link] | 12.0 | |||
Baseline (Distr) | 8.0 | |||
Baseline (Random) | 7.4* | 8.0 | ||
Baseline (Pos) | 3.9* | 4.0 | ||
Baseline (Neg) | 5.2* | 5.0 |
*: Results that were not mentioned in papers.
**: We asked another super-annotator to label the collection, and compared her annotation with our gold standard using average F-measure of positive and negative classes in the same way as for automatic approaches. In such a way, we can reveal the upper border for automatic algorithms. We obtained that F-measure of human labeling. [1]
***: We consider translation into english samples via the arekit-ss by translating texts into
english first, and then wrapping them into prompts. We consider a k
-words distance (50
by default, in english) between words as a upper bound for pairs organization;
because of the latter and prior standards, results might be lower (translation increases distance in words).
The training process is described in Rusnachenko et. al., 2020 (section 7.1) and
relies on the Multi-Instance learning approach, originally proposed in Zeng et. al., 2015 paper.
(SGD application, bags terminology, instances selection within bags).
All the batch context samples are gathered into bags.
Authors propose to select the best instance in every bag as follows:
calculate the max
value of p(yi|mi,j) across i'th values within a particular j'th bag.
The latter allows them to adopt loss
function on bags level.
In our works, we adopt bags for synonymous context gathering.
Therefore, for gradients calculation within bags, we choose avg
function instead.
The assumption here is to consider other synonymous attitudes during the gradients calculation procedure.
We use BagSize > 1
in earlier work Rusnachenko, 2018
In the latest experiments, we consider BagSize = 1
and therefore don't exploit bag values averaging.
Awesome Sentiment Attitude Extraction
[1] Natalia Loukachevitch, Nicolay Rusnachenko Extracting Sentiment Attitudes from Analytical Texts Proceedings of International Conference on Computational Linguistics and Intellectual Technologies Dialogue-2018 (arXiv:1808.08932) [paper] [code]
[2] Nicolay Rusnachenko, Natalia Loukachevitch Using Convolutional Neural Networks for Sentiment Attitude Extraction from Analytical Texts, EPiC Series in Language and Linguistics 4, 1-10, 2019 [paper] [code]
[3] Nicolay Rusnachenko, Natalia Loukachevitch Studying Attention Models in Sentiment Attitude Extraction Task Métais E., Meziane F., Horacek H., Cimiano P. (eds) Natural Language Processing and Information Systems. NLDB 2020. Lecture Notes in Computer Science, vol 12089. Springer, Cham [paper] [code]
[4] Nicolay Rusnachenko, Natalia Loukachevitch Attention-Based Neural Networks for Sentiment Attitude Extraction using Distant Supervision The 10th International Conference on Web Intelligence, Mining and Semantics (WIMS 2020), June 30-July 3 (arXiv:2006.13730) [paper] [code]
[5] Nicolay Rusnachenko, Natalia Loukachevitch, Elena Tutubalina Distant Supervision for Sentiment Attitude Extraction Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019) [paper] [code]
[6] Nicolay Rusnachenko Language Models Application in Sentiment Attitude Extraction Task Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(3):199-222. (In Russ.) [paper] [code-networks] [code-bert]
[7] Bowen Zhang, Daijun Ding, Liwen Jing How would Stance Detection Techniques Evolve after the Launch of ChatGPT? [paper]