Skip to content

anonimoustt/anonymous

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Introduction

In this research, we have proposed an unsupervised text classification method that computes the probability scores of the target topics of the short text corpus, specifically conversation texts. The typical text classification state-of-the-art requires the trained model, training data, human opinion, and large corpus to perform inference efficiently. Our Proposed approach is independent of these requirements. We have used the Sentence Transformer to gain the embeddings. We have leveraged other embedding methods, such as Word2Vec, and Fast Text, and compared the model performance with our approach. Further, we make a comparison with the zero-shot and few-shot models. In our experiment, we use two benchmarks: daily dialog {Lhoest_Datasets_A_Community_2021} and dialog sum {chen-etal-2021-dialogsum} data. The empirical outcomes depict that our model outperforms the traditional text classification techniques.

Data

The dialog sum {chen-etal-2021-dialogsum} data are available at:

https://drive.google.com/drive/folders/1VnW2__6D2RtI0TMP7Ggsyp20TKeevDvq?usp=sharing

The daily dialog {Lhoest_Datasets_A_Community_2021} data available at:

https://huggingface.co/datasets/peandrew/dialy_dialogue_with_recoginized_concept_raw

Code

  1. The daily dialog {Lhoest_Datasets_A_Community_2021} application available at the following google colab:

https://colab.research.google.com/drive/1QJY60RVnX5etwU0wLImPXU6ra5NpEDVk?usp=sharing

  1. The dialog sum {chen-etal-2021-dialogsum} application available at the following google colab:

https://colab.research.google.com/drive/15D03KLZTzLk0M5vSlEeUWiNhYRJvU33f?usp=sharing

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published