Language-Models-German-Simplification

This repository contains the code for the paper "Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training".
You can find the published models in the Huggingface hub.
The data used for this project can be downloaded using our scrapers.

Fine-tuning language models

Use the finetuning.py script to create your own Leichte Sprache language models. You need to download/scrape the monolingual corpus from here first.

Re-creating the results from the paper

The evaluations for the perplexity scores, the readability of the language model outouts and the downstream task performance are provided in the respective scripts. We also publish the answers from the human grammar evaluation in the file evaluation/Evaluierung von large language models.csv. You can analyze these results with the human evaluation notebook.

For the application of the language models as ATS decoders, please refer to the original Github repo. You can find the fine-tuned simplification model on Huggingface. The simplification results are stored in the original tensorboard_logs.

Citation

If you use our models or the code in one of our repos, please use the following citation:

@inproceedings{anschutz-etal-2023-language,  
    title = "Language Models for {G}erman Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training",  
    author = {Ansch{\"u}tz, Miriam  and Oehms, Joshua  and Wimmer, Thomas  and Jezierski, Bart{\l}omiej  and Groh, Georg},  
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",  
    month = jul,  
    year = "2023",  
    address = "Toronto, Canada",  
    publisher = "Association for Computational Linguistics",  
    url = "https://aclanthology.org/2023.findings-acl.74",  
    pages = "1147--1158",  
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
evaluation		evaluation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compare_downstream_task_performances.py		compare_downstream_task_performances.py
compare_perplexities.py		compare_perplexities.py
compare_reading_eases.py		compare_reading_eases.py
finetuning.py		finetuning.py
metrics.py		metrics.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language-Models-German-Simplification

Fine-tuning language models

Re-creating the results from the paper

Citation

About

Releases

Packages

Contributors 2

Languages

License

MiriUll/Language-Models-German-Simplification

Folders and files

Latest commit

History

Repository files navigation

Language-Models-German-Simplification

Fine-tuning language models

Re-creating the results from the paper

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages