Persian Stance Classification

We released here a Persian dataset that can be used for a number of NLP tasks in the context of fact-checking. Although this dataset can be used for fact-checking and summarization, the focus of this work is on stance classification as a stepping stone for fake news detection in the Persian language.

In order to collect this dataset, after collecting articles, for each claim we allocate three labels; the first label is article (body text) stance according to the claim (article-claim stance), the second label is the article’s headline stance according to the claim (headline-claim stance) and the third one is article (body text) stance according to its headline (article-headline stance). We release here article-claim stance as ArticleToClaim.txt file and headline-claim stance as HeadlineToClaim.txt file. In addition, we have released FullDataset.txt, this dataset can be used in order to stance detection and fake or rumor detection in Persian.

Embedding

With respect to text embedding, we created matrix embeddings by using fastText and the create_embedding_matrix function in the LSTMPersianStance_HeadToClaim.ipynb file and saved this dictionary (matrix embedding) as w2v_persian.pkl . The matrix embedding is then loaded whenever it is needed.

Annotation Guideline

We prepared a guideline in both English and Persian language, which consists of notes, suggestions, and examples about stance labels. The file named GuidLine_FA.pdf contains a Persian guideline and the file named GuideLine_EN.pdf contains an English guideline.

The Dataset License

Our Persian stance classification dataset is being provided to you under license CC BY-NC. You can read more about this licence here.

The Related Paper

Our academic paper which describes the process of building our dataset in detail and provides full results can be found here: https://truthandtrustonline.files.wordpress.com/2019/10/paper_30.pdf .

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Dataset		Dataset
ArticleToClaim.txt		ArticleToClaim.txt
BaseLineGridSearchResults_H2C.ipynb		BaseLineGridSearchResults_H2C.ipynb
BaseLineWithGridSearch_H2C.ipynb		BaseLineWithGridSearch_H2C.ipynb
FullDataset.txt		FullDataset.txt
GuidLine_FA.pdf		GuidLine_FA.pdf
GuideLine_EN.pdf		GuideLine_EN.pdf
HeadlineToClaim.txt		HeadlineToClaim.txt
LSTMPersianStance_ArticleToClaim.ipynb		LSTMPersianStance_ArticleToClaim.ipynb
LSTMPersianStance_HeadToClaim.ipynb		LSTMPersianStance_HeadToClaim.ipynb
README.md		README.md
StopWords_fa.txt		StopWords_fa.txt
psfeatureextractor.py		psfeatureextractor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persian Stance Classification

Embedding

Annotation Guideline

The Dataset License

The Related Paper

About

Releases

Packages

Languages

Zarharan/PersianStanceDetection

Folders and files

Latest commit

History

Repository files navigation

Persian Stance Classification

Embedding

Annotation Guideline

The Dataset License

The Related Paper

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages