In this research, we have proposed an unsupervised text classification method that computes the probability scores of the target topics of the short text corpus, specifically conversation texts. The typical text classification state-of-the-art requires the trained model, training data, human opinion, and large corpus to perform inference efficiently. Our Proposed approach is independent of these requirements. We have used the Sentence Transformer to gain the embeddings. We have leveraged other embedding methods, such as Word2Vec, and Fast Text, and compared the model performance with our approach. Further, we make a comparison with the zero-shot and few-shot models. In our experiment, we use two benchmarks: daily dialog {Lhoest_Datasets_A_Community_2021} and dialog sum {chen-etal-2021-dialogsum} data. The empirical outcomes depict that our model outperforms the traditional text classification techniques.
The dialog sum {chen-etal-2021-dialogsum} data are available at:
https://drive.google.com/drive/folders/1VnW2__6D2RtI0TMP7Ggsyp20TKeevDvq?usp=sharing
The daily dialog {Lhoest_Datasets_A_Community_2021} data available at:
https://huggingface.co/datasets/peandrew/dialy_dialogue_with_recoginized_concept_raw
- The daily dialog {Lhoest_Datasets_A_Community_2021} application available at the following google colab:
https://colab.research.google.com/drive/1QJY60RVnX5etwU0wLImPXU6ra5NpEDVk?usp=sharing
- The dialog sum {chen-etal-2021-dialogsum} application available at the following google colab:
https://colab.research.google.com/drive/15D03KLZTzLk0M5vSlEeUWiNhYRJvU33f?usp=sharing