We propose KpopMT dataset, which enables precise terminology translation, choosing Kpop fandom as an initiative for social groups given its global popularity. Expert translators provide 1k English translations for Korean posts and comments, each annotated with specific terminology within social groups' language systems. We evaluate existing translation systems including GPT models on KpopMT to identify their failure cases. Results show overall low scores, underscoring the challenges of reflecting group-specific terminologies and styles in translation. We plan to expand KpopMT to encompass other social groups, such as sports and global movie communities.
Important
We provide three kinds of datasets: parallel (tagged.lang.txt), monolingual (fan-monolingual.lang), and termbase (termbase-category). Details are in the paper.
@misc{kim2024kpopmttranslationdatasetterminology,
title={KpopMT: Translation Dataset with Terminology for Kpop Fandom},
author={JiWoo Kim and Yunsu Kim and JinYeong Bak},
year={2024},
eprint={2407.07413},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.07413},
}