Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

german wordlist from dys2p #80

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

b068931cc450442b63f5b3d276ea4297

Since the usability of Diceware depends on the quality of the word lists, a word list should consist of words that are as familiar and easy to remember as possible. The best known German diceware word list diceware_german.txt does not meet these conditions. It contains special characters, numbers, letter sequences that are neither an abbreviation nor a word (e.g. zv, zw, zx, zy, zz, zzz and zzzz) and letter sequences that resemble words but are not themselves words of the German language.

Our word list de-7776 is suitable as a diceware word list for five cubes. The words are unique from the fifth letter on. Furthermore, it follows these rules for the most part, but not one hundred percent:

  • Words are three to twelve characters long.
  • No word contains the characters ä, ö, ü and ß.
  • If possible, only familiar nouns, verbs and adjectives should be included, and in their basic form (nouns in the singular, verbs in the infinitive, adjectives in their uninflected form).
  • No proper names, regions, religions, associations, or persons.
  • No words with particularly negative connotations.
  • The "masculine" grammatical gender is preferred. (This is standard for BIP39.)

@pomeloy
Copy link

pomeloy commented Mar 8, 2023

The list definitely looks good and would be a vast improvement over the default German list in absence of #59. Are the words hand picked or does the list follow a similar methodology to the DeReKo list? Skimming over your list I found multiple verbs both in infinitive and third-person singular, spelled out numbers are present while e.g. colors are missing.

@b068931cc450442b63f5b3d276ea4297
Copy link
Author

Thank you very much, I am sorry you are right. The list is handpicked and I will be able to revise it in the coming (hopefully) weeks.

@plan5
Copy link

plan5 commented Oct 10, 2024

Hi, great to see you're working on this list. It's been a while though, are you still on it?

I've been looking at the German wordlist especially from the "negative connotations" angle: I use it in workshops and other demonstration contexts and the most prominent German lists contain slurs such as 44355 in tenne and 44141 in DeReKo, which are the German translation for the n-word. They're not an isolated case, unfortunately.

Great to see you take this into account in your list! However, it still contains long words that I'd like to avoid.
This is another list that looks promising to me but I haven't been able to read it through:
https://github.com/bjoernalbers/diceware-wordlist-german/blob/main/wordlist-german-diceware.txt

The author imposed a length limit of 8 characters but doesn't explicitly exclude proper names but I haven't found any such names yet. The list is based on DWDS (see the readme, link below).

From the readme:

Each word has been manually checked to be familiar and office-friendly (not vulgar, offensive, religious or with negative connotations). The words also meet these formal conditions:

  • 4 to 8 characters long
  • contains english letters only (no german special characters like umlauts)
  • a known noun, verb or adjective in its basic form

It'd be great to see this list or your list included in the web application which I show but don't use for workshops (yet).

Is there anything I can do to support you here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants