GitHub - colincwilson/MinimalGeneralizationLearner: Minimal Generalization Learner for morphology and phonology (Albright & Hayes 2002, 2003) with command-line interface

This repository provides a command-line interface for the Minimal Generalization Learner (Albright & Hayes 2002, 2003, etc.) and slightly modified input files for English past-tense simulations. Send comments or questions to colin-dot-wilson-at-jhu-dot-edu.

version1/

The original version of the learner with a GUI interface, files downloaded [06/14/2021] from http://www.mit.edu/~albright/mgl/ and https://linguistics.ucla.edu/people/hayes/RulesVsAnalogy/index.html. MinGenLearner.jar seems to provide the most up-to-date compiled code. Run with: java -jar MinGenLearner.jar

version2/

The new version of the learner with a command-line interface. The (only) file added to the original code is src/LearnerCommandline.java. Arguments for the learner are specified with YAML files, see for example english.yaml. Run the English past-tense simulation with 00runme.sh.

This version comes already compiled (with Java 16). Recompile with: ant -buildfile MinimalGeneralizationLearner.xml. The one external dependency is SnakeYAML, included here as extern/snakeyaml-1.29.jar.

The new version also has the original GUI interace, run with: java -jar bin/mingenlearn.jar

english/

English past-tense data provided by Albright & Hayes, with various transcriptions.

English1 and English2 have the original small and large data sets, respectively. Some erroneous past-tense forms in the original English2 data are listed in English2_errors.txt.
English2_unicode has transcriptions that are closer to IPA but adhere to the one-symbol-per-phoneme format required by the learner; these are the input files for the example simulation in version2/00runme.sh. Also in this folder are the English data from the SIGMORPHON 2021 Shared Task on Generalization in Morphological Inflection Generation.
English2_IPA has the data in space-separated IPA transcription, which is incompatible with the learner.
English_phonemes.ods compares the transcription systems and provides a feature matrix.

feature file format

The format of feature (.fea) files is quite strict, as follows:

ASCII<tab>Seg.<tab><long feature names, tab-separated>
<tab><tab><short feature names, tab-separated>
<id_1><tab>segment_1<tab><feature specifications (+1, 0, -1), tab-separated>
<id_2><tab>segment_2<tab><feature specifications (+1, 0, -1), tab-separated>
...
<id_m><tab>ˈ<tab><feature specifications of stress symbol, tab-separated>
<id_n><tab>~<tab><feature specifications of empty symbol, tab-separated>

The 'ASCII' values must be unique integers, but are otherwise arbitrary.
The stress symbol can be omitted if it does not appear in the transcriptions.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
english		english
version1		version1
version2		version2
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

version1/

version2/

english/

feature file format

About

Releases

Packages

Languages

colincwilson/MinimalGeneralizationLearner

Folders and files

Latest commit

History

Repository files navigation

version1/

version2/

english/

feature file format

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages